Releases · pathwaycom/pathway

10 Apr 07:28

github-actions

v0.21.2

98c0786

v0.21.2

Added

Added synchronization group mechanism to align multiple data sources based on selected columns. It can be accessed with pw.io.register_input_synchronization_group.
pw.io.register_input_synchronization_group now supports the following types of columns: pw.DateTimeUtc, pw.DateTimeNaive, pw.DateTimeDuration, and int.

Changed

Enhanced error reporting for runtime errors across most operators, providing a trace that simplifies identifying the root cause.

Fixed

Bugfix for problem with list_documents() when no documents present in store.
The append-only property of tables created by pw.io.kafka.read is now set correctly.

Assets 6

28 Mar 11:39

github-actions

v0.21.1

ac5969b

v0.21.1

Changed

Input connectors now throttle parsing error messages if their share is more than 10% of the parsing attempts.
New flag return_status for inputs_query method in pw.xpacks.llm.DocumentStore. If set to True, DocumentStore returns the status of indexing for each file.

Assets 6

19 Mar 13:46

github-actions

v0.21.0

48ff05d

v0.21.0

Added

All Pathway types can now be serialized to CSV using pw.io.csv.write and deserialized back using pw.io.csv.read.
pw.io.csv.read now parses null-values in data when it can be done unambiguously.

Changed

BREAKING: Updated endpoints in pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer:
- Deprecated: /v1/pw_list_documents, /v1/pw_ai_answer
- New: /v2/list_documents, /v2/answer
RAG methods under the pw.xpacks.llm.question_answering.RAGClient are re-named, and they now use the new endpoints. Old methods are deprecated and will be removed in the future.
- pw_ai_summary -> summarize
- pw_ai_answer -> answer
- pw_list_documents -> list_documents
When pw.io.deltalake.write creates a table, it also stores its metadata in the columns of the created Delta table. This metadata can be used by Pathway when reading the table with pw.io.deltalake.read if no schema is specified.
The schema parameter is now optional for pw.io.deltalake.read. If the table was created by Pathway and the schema was not specified by user, it is read from the table metadata.
pw.io.deltalake.write now aligns the output metadata with the existing table's metadata, preserving any custom metadata in the sink.
BREAKING: The Bytes type is now serialized and deserialized with base64 encoding and decoding when the CSV format is used.
BREAKING: The Duration type is now serialized and deserialized as a number of nanoseconds when the CSV format is used.
BREAKING: The tuple and np.ndarray types are now serialized and deserialized as their JSON representations when the CSV format is used.

Fixed

pw.io.csv.write now correctly escapes quote characters.

Assets 6

07 Mar 08:18

github-actions

v0.20.1

4ef8258

v0.20.1

Added

Added RecursiveSplitter
pw.io.deltalake.write now checks that the schema of the target table Delta Table corresponds to the schema of the Pathway table that is sent for the output. If the schemas differ, a human-readable error message is produced.

Assets 6

25 Feb 08:10

github-actions

v0.20.0

e4d6d91

v0.20.0

[0.20.0] - 2025-02-25

Added

Added structure-aware chunking for DoclingParser.
Added table_parsing_strategy for DoclingParser.
Column expressions as_int(), as_float(), as_str(), and as_bool() now accept additional arguments, unwrap and default, to simplify null handling.
Support for python tuples in expressions.

Changed

BREAKING: Changed the argument in DoclingParser from parse_images (bool) into image_parsing_strategy (Literal["llm"] | None).
BREAKING: doc_post_processors argument in the pw.xpacks.llm.document_store.DocumentStore now longer accepts pw.UDF.
Better error messages when using pathway spawn with multiple workers. Now error messages are printed only from the worker experiencing the error directly.

Fixed

doc_post_processors argument in the pw.xpacks.llm.document_store.DocumentStore had no effect. This is now fixed.

Assets 6

20 Feb 13:12

github-actions

v0.19.0

2118468

v0.19.0

Added

LLMReranker now supports custom prompts as well as custom response parsers allowing for other ranking scales apart from default 1-5.
pw.io.kafka.write and pw.io.nats.write now support ColumnReference as a topic name. When a ColumnReference is provided, each message's topic is determined by the corresponding column value.
pw.io.python.write accepting ConnectorObserver as an alternative to pw.io.subscribe.
pw.io.iceberg.read and pw.io.iceberg.write now support S3 as data backend and AWS Glue catalog implementations.
All output connectors now support the sort_by field for ordering output within a single minibatch.
A new UDF executor pw.udfs.fully_async_executor. It allows for creation of non-blocking asynchronous UDFs which results can be returned in the future processing time.
A Future data type to represent results of fully asynchronous UDFs.
pw.Table.await_futures method to wait for results of fully asynchronous UDFs.
pw.io.deltalake.write now supports partition columns specification.

Changed

BREAKING: Changed the interface of LLMReranker, the use_logit_bias, cache_strategy, retry_strategy and kwargs arguments are no longer supported.
BREAKING: LLMReranker no longer inherits from pw.UDF
BREAKING: pw.stdlib.utils.AsyncTransformer.output_table now returns a table with columns with Future data type.
pw.io.deltalake.read can now read append-only tables without requiring explicit specification of primary key fields.

Assets 6

07 Feb 16:10

github-actions

v0.18.0

51a5660

v0.18.0

Added

pw.io.postgres.write and pw.io.postgres.write_snapshot now handle serialization of PyObjectWrapper and Timedelta properly.
New chunking options in pathway.xpacks.llm.parsers.UnstructuredParser
Now all Pathway types can be serialized into JSON and consistently deserialized back.
table.col.dt.to_duration converting an integer into a pw.Duration.
pw.Json now supports storing datetime and duration type values in ISO format.

Changed

BREAKING: Changed the interface of UnstructuredParser
BREAKING: The Pointer type is now serialized and deserialized as a string field in Iceberg and Delta Lake.
BREAKING: The Bytes type is now serialized and deserialized with base64 encoding and decoding when the JSON format is used. A string field is used to store the encoded contents.
BREAKING: The Array type is now serialized and deserialized as an object with two fields: shape denoting the shape of the stored multi-dimensional array and elements denoting the elements of the flattened array.
BREAKING: Marked package as py.typed to indicate support for type hints.

Removed

BREAKING: Removed undocumented license_key argument from pw.run and pw.run_all methods. Instead, pw.set_license_key should be used.

Assets 6

31 Jan 12:07

github-actions

v0.17.0

36b9ec2

v0.17.0

Added

pw.io.iceberg.read method for reading Apache Iceberg tables into Pathway.
methods pw.io.postgres.write and pw.io.postgres.write_snapshot now accept an additional argument init_mode, which allows initializing the table before writing.
pw.io.deltalake.read now supports serialization and deserialization for all Pathway data types.
New parser pathway.xpacks.llm.parsers.DoclingParser supporting parsing of pdfs with tables and images.
Output connectors now include an optional name parameter. If provided, this name will appear in logs and monitoring dashboards.
Automatic naming for input and output connectors has been enhanced.

Changed

BREAKING: pw.io.deltalake.read now requires explicit specification of primary key fields.
BREAKING: pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer now returns a dictionary from pw_ai_answer endpoint.
pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer allows optionally returning context documents from pw_ai_answer endpoint.
BREAKING: When using delay in temporal behavior, current time is updated immediately, not in the next batch.
BREAKING: The Pointer type is now serialized to Delta Tables as raw bytes.
pw.io.kafka.write now allows to specify key and headers for JSON and CSV data formats.
persistent_id parameter in connectors has been renamed to name. This new name parameter allows you to assign names to connectors, which will appear in logs and monitoring dashboards.
Changed names of parsers to be more consistent: ParseUnstrutured -> UnstructuredParser, ParseUtf8 -> Utf8Parser. ParseUnstrutured and ParseUtf8 are now deprecated.

Fixed

generate_class method in Schema now correctly renders columns of UnionType and None types.
a bug in delay in temporal behavior. It was possible to emit a single entry twice in a specific situation.
pw.io.postgres.write_snapshot now correctly handles tables that only have primary key columns.

Removed

BREAKING: pw.indexing.build_sorted_index, pw.indexing.retrieve_prev_next_values, pw.indexing.sort_from_index and pw.indexing.SortedIndex are removed. Sorting is now done with pw.Table.sort.
BREAKING: Removed deprecated methods pw.Table.unsafe_promise_same_universe_as, pw.Table.unsafe_promise_universes_are_pairwise_disjoint, pw.Table.unsafe_promise_universe_is_subset_of, pw.Table.left_join, pw.Table.right_join, pw.Table.outer_join, pw.stdlib.utils.AsyncTransformer.result.
BREAKING: Removed deprecated column _pw_shard in the result of windowby.
BREAKING: Removed deprecated functions pw.debug.parse_to_table, pw.udf_async, pw.reducers.npsum, pw.reducers.int_sum, pw.stdlib.utils.col.flatten_column.
BREAKING: Removed deprecated module pw.asynchronous.
BREAKING: Removed deprecated access to functions from pw.io in pw.
BREAKING: Removed deprecated classes pw.UDFSync, pw.UDFAsync.
BREAKING: Removed class pw.xpack.llm.parsers.OpenParse. It's functionality has been replaced with pw.xpack.llm.parsers.DoclingParser.
BREAKING: Removed deprecated arguments from input connectors: value_columns, primary_key, types, default_values. Schema should be used instead.

Assets 6

09 Jan 15:14

github-actions

v0.16.4

5d30c34

v0.16.4

Fixed

Google Drive connector in static mode now correctly displays in jupyter visualizations.

Assets 5

02 Jan 14:38

github-actions

v0.16.3

eb36786

v0.16.3

Added

pw.io.iceberg.write method for writing Pathway tables into Apache Iceberg.

Changed

values of non-deterministic UDFs are not stored in tables that are append_only.
pw.Table.ix has better runtime error message that includes id of the missing row.

Fixed

temporal behaviors in temporal operators (windowby, interval_join) now consume no CPU when no data passes through them.

Assets 5

Releases: pathwaycom/pathway

v0.21.2

Added

Changed

Fixed

Uh oh!

v0.21.1

Changed

Uh oh!

v0.21.0

Added

Changed

Fixed

Uh oh!

v0.20.1

Added

Uh oh!

v0.20.0

[0.20.0] - 2025-02-25

Added

Changed

Fixed

Uh oh!

v0.19.0

Added

Changed

Uh oh!

v0.18.0

Added

Changed

Removed

Uh oh!

v0.17.0

Added

Changed

Fixed

Removed

Uh oh!

v0.16.4

Fixed

Uh oh!

v0.16.3

Added

Changed

Fixed

Uh oh!