Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions packages/pynumaflow/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,42 @@ proto:
poetry run python3 -m grpc_tools.protoc -Ipynumaflow/proto/sideinput=pynumaflow/proto/sideinput -Ipynumaflow/proto/common=pynumaflow/proto/common --pyi_out=. --python_out=. --grpc_python_out=. pynumaflow/proto/sideinput/*.proto
poetry run python3 -m grpc_tools.protoc -Ipynumaflow/proto/sourcer=pynumaflow/proto/sourcer -Ipynumaflow/proto/common=pynumaflow/proto/common --pyi_out=. --python_out=. --grpc_python_out=. pynumaflow/proto/sourcer/*.proto
poetry run python3 -m grpc_tools.protoc -Ipynumaflow/proto/accumulator=pynumaflow/proto/accumulator -Ipynumaflow/proto/common=pynumaflow/proto/common --pyi_out=. --python_out=. --grpc_python_out=. pynumaflow/proto/accumulator/*.proto


# ============================================================================
# Documentation targets
# ============================================================================

.PHONY: docs docs-serve docs-build docs-deploy-dev docs-deploy-version docs-list docs-set-default docs-delete

docs: docs-serve ## Alias for docs-serve

docs-serve: ## Serve documentation locally with hot-reload (http://localhost:8000)
poetry run mkdocs serve

docs-build: ## Build documentation locally
poetry run mkdocs build

docs-deploy-dev: ## Deploy dev docs to docs-site branch
poetry run mike deploy -b docs-site dev --push

docs-deploy-version: ## Deploy versioned docs (usage: make docs-deploy-version VERSION=0.11)
ifndef VERSION
$(error VERSION is required. Usage: make docs-deploy-version VERSION=0.11)
endif
poetry run mike deploy -b docs-site $(VERSION) latest --update-aliases --push

docs-list: ## List all deployed documentation versions
poetry run mike list -b docs-site

docs-set-default: ## Set the default documentation version to 'latest'
poetry run mike set-default -b docs-site latest --push

docs-delete: ## Delete a documentation version (usage: make docs-delete VERSION=0.10)
ifndef VERSION
$(error VERSION is required. Usage: make docs-delete VERSION=0.10)
endif
poetry run mike delete -b docs-site $(VERSION) --push

docs-setup: ## Install documentation dependencies
poetry install --with docs --no-root
18 changes: 9 additions & 9 deletions packages/pynumaflow/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ To build the package locally, run the following command from the root of the pro

```bash
make setup
````
```

To run unit tests:
```bash
Expand Down Expand Up @@ -57,22 +57,21 @@ There are different types of gRPC server mechanisms which can be used to serve t
These have different functionalities and are used for different use cases.

Currently we support the following server types:

- Sync Server
- Asyncronous Server
- MultiProcessing Server

Not all of the above are supported for all UDFs, UDSource and UDSinks.

For each of the UDFs, UDSource and UDSinks, there are seperate classes for each of the server types.
This helps in keeping the interface simple and easy to use, and the user can start the specific server type based
on the use case.
This helps in keeping the interface simple and easy to use, and the user can start the specific server type based on the use case.


#### SyncServer

Syncronous Server is the simplest server type. It is a multithreaded threaded server which can be used for simple UDFs and UDSinks.
Here the server will invoke the handler function for each message. The messaging is synchronous and the server will wait
for the handler to return before processing the next message.
Here the server will invoke the handler function for each message. The messaging is synchronous and the server will wait for the handler to return before processing the next message.

```
grpc_server = MapServer(handler)
Expand All @@ -83,13 +82,13 @@ grpc_server = MapServer(handler)
Asyncronous Server is a multi threaded server which can be used for UDFs which are asyncronous. Here we utilize the asyncronous capabilities of Python to process multiple messages in parallel. The server will invoke the handler function for each message. The messaging is asyncronous and the server will not wait for the handler to return before processing the next message. Thus this server type is useful for UDFs which are asyncronous.
The handler function for such a server should be an async function.

```
```py
grpc_server = MapAsyncServer(handler)
```

#### MultiProcessServer

MultiProcess Server is a multi process server which can be used for UDFs which are CPU intensive. Here we utilize the multi process capabilities of Python to process multiple messages in parallel by forking multiple servers in different processes.
MultiProcess Server is a multi process server which can be used for UDFs which are CPU intensive. Here we utilize the multi process capabilities of Python to process multiple messages in parallel by forking multiple servers in different processes.
The server will invoke the handler function for each message. Individually at the server level the messaging is synchronous and the server will wait for the handler to return before processing the next message. But since we have multiple servers running in parallel, the overall messaging also executes in parallel.

This could be an alternative to creating multiple replicas of the same UDF container as here we are using the multi processing capabilities of the system to process multiple messages in parallel but within the same container.
Expand Down Expand Up @@ -140,7 +139,8 @@ should follow the same signature.

For using the class based handlers the user can inherit from the base handler class for each of the functionalities and implement the handler function.
The base handler class for each of the functionalities has the same signature as the handler function for the respective server type.
The list of base handler classes for each of the functionalities is given below -
The list of base handler classes for each of the functionalities is given below:

- UDFs
- Map
- Mapper
Expand All @@ -159,5 +159,5 @@ The list of base handler classes for each of the functionalities is given below
- SideInput
- SideInput

More details about the signature of the handler function for each of the server types is given in the
More details about the signature of the handler function for each of the server types is given in the
documentation of the respective server type.
15 changes: 15 additions & 0 deletions packages/pynumaflow/docs/api/accumulator.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Accumulator

This module offers tools for accumulating and processing data while managing state. With it, you can:

- Accumulate data over time
- Maintain state across messages
- Process accumulated data

## Classes

::: pynumaflow.accumulator
options:
show_root_heading: false
show_root_full_path: false
members_order: source
11 changes: 11 additions & 0 deletions packages/pynumaflow/docs/api/batchmapper.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Batch Mapper

The Batch Mapper module offers tools for building BatchMap UDFs, allowing you to process multiple messages simultaneously. This enables more efficient handling of workloads such as bulk API requests or batch database operations by grouping messages and processing them together in a single operation.

## Classes

::: pynumaflow.batchmapper
options:
show_root_heading: false
show_root_full_path: false
members_order: source
18 changes: 18 additions & 0 deletions packages/pynumaflow/docs/api/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# API Reference

This section provides detailed API documentation for all pynumaflow modules.

## Modules

| Module | Description |
|--------|-------------|
| [Sourcer](sourcer.md) | User Defined Source for custom data sources |
| [Source Transformer](sourcetransformer.md) | Transform data at ingestion |
| [Mapper](mapper.md) | Map UDF for transforming messages one at a time |
| [Map Streamer](mapstreamer.md) | MapStream UDF for streaming results as they're produced |
| [Batch Mapper](batchmapper.md) | BatchMap UDF for processing messages in batches |
| [Sinker](sinker.md) | User Defined Sink for custom data destinations |
| [Reducer](reducer.md) | Reduce UDF for aggregating messages by key and time window |
| [Reduce Streamer](reducestreamer.md) | Stream reduce results incrementally |
| [Accumulator](accumulator.md) | Accumulate and process data with state |
| [Side Input](sideinput.md) | Inject external data into UDFs |
16 changes: 16 additions & 0 deletions packages/pynumaflow/docs/api/mapper.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Mapper

The Mapper module provides classes and functions for implementing Map UDFs that transform messages one at a time.
Map is the most common UDF type. It receives one message at a time and can return:

- One message (1:1 transformation)
- Multiple messages (fan-out)
- No messages (filter/drop)

## Classes

::: pynumaflow.mapper
options:
show_root_heading: false
show_root_full_path: false
members_order: source
12 changes: 12 additions & 0 deletions packages/pynumaflow/docs/api/mapstreamer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Map Streamer

The Map Streamer module provides classes and functions for implementing MapStream UDFs that stream results as they're produced.
Unlike regular Map which returns all messages at once, Map Stream yields messages one at a time as they're ready, reducing latency for downstream consumers.

## Classes

::: pynumaflow.mapstreamer
options:
show_root_heading: false
show_root_full_path: false
members_order: source
12 changes: 12 additions & 0 deletions packages/pynumaflow/docs/api/reducer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Reducer

The Reducer module provides classes and functions for implementing Reduce UDFs that aggregate messages by key within time windows.
It's used for operations like counting, summing, or computing statistics over groups of messages.

## Classes

::: pynumaflow.reducer
options:
show_root_heading: false
show_root_full_path: false
members_order: source
12 changes: 12 additions & 0 deletions packages/pynumaflow/docs/api/reducestreamer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Reduce Streamer

The Reduce Streamer module provides classes and functions for implementing ReduceStream UDFs that emit results incrementally during reduction.
Unlike regular Reduce which outputs only when the window closes, Reduce Stream emits results as they're computed. This is useful for early alerts or real-time dashboards.

## Classes

::: pynumaflow.reducestreamer
options:
show_root_heading: false
show_root_full_path: false
members_order: source
11 changes: 11 additions & 0 deletions packages/pynumaflow/docs/api/sideinput.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Side Input

Side Input allows you to inject external data into your UDFs. This is useful for configuration, lookup tables, or any data that UDFs need but isn't part of the main data stream.

## Classes

::: pynumaflow.sideinput
options:
show_root_heading: false
show_root_full_path: false
members_order: source
11 changes: 11 additions & 0 deletions packages/pynumaflow/docs/api/sinker.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Sinker

The Sinker module provides classes and functions for implementing User Defined Sinks that write processed data to external systems ((database, kafka topic, etc.)).

## Classes

::: pynumaflow.sinker
options:
show_root_heading: false
show_root_full_path: false
members_order: source
11 changes: 11 additions & 0 deletions packages/pynumaflow/docs/api/sourcer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Sourcer

The Sourcer module provides classes and functions for implementing User Defined Sources that produce messages for Numaflow pipelines.

## Classes

::: pynumaflow.sourcer
options:
show_root_heading: false
show_root_full_path: false
members_order: source
18 changes: 18 additions & 0 deletions packages/pynumaflow/docs/api/sourcetransformer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Source Transformer

The Source Transformer module provides classes and functions for implementing Source Transform UDFs that transform data immediately after it's read from a source.
Source Transform is useful for:

- Parsing/deserializing data at ingestion
- Filtering messages early
- Assigning event times
- Adding metadata
- Routing messages with tags

## Classes

::: pynumaflow.sourcetransformer
options:
show_root_heading: false
show_root_full_path: false
members_order: source
111 changes: 111 additions & 0 deletions packages/pynumaflow/docs/changelog.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# Changelog

All notable changes to pynumaflow will be documented in this page.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.11.0] - Latest

### Added

- Accumulator functionality for stateful data accumulation
- ReduceStream for streaming reduce results
- Improved type hints throughout the codebase

### Changed

- Updated dependencies to latest versions
- Enhanced error handling in gRPC servers

### Fixed

- Various bug fixes and performance improvements

---

## [0.10.0]

### Added

- BatchMap support for processing messages in batches
- MultiProcess server support for Map and SourceTransform
- Improved async server implementations

### Changed

- Refactored server architecture for better performance
- Updated protobuf definitions

---

## [0.9.0]

### Added

- MapStream functionality
- Side Input support
- Enhanced metadata in Datum objects

### Changed

- Improved connection handling
- Better error messages

---

## [0.8.0]

### Added

- User Defined Source support
- Source Transform functionality
- Headers support in messages

### Changed

- Updated gRPC communication protocol
- Enhanced logging

---

## [0.7.0]

### Added

- Async server support for Map and Reduce
- User Defined Sink functionality
- Tagging support for message routing

### Changed

- Improved memory management
- Better type annotations

---

## [0.6.0]

### Added

- Basic Map and Reduce UDF support
- gRPC server implementation
- Initial documentation

---

## Upgrade Guide

### Upgrading to 0.11.0

No breaking changes. New features are additive.

### Upgrading to 0.10.0

If using custom server configurations, review the new server options.

---

## Release Notes

For detailed release notes, see the [GitHub Releases](https://github.com/numaproj/numaflow-python/releases) page.
Loading