Skip to content

Releases: pathwaycom/pathway

v0.21.2

10 Apr 07:28

Choose a tag to compare

Added

  • Added synchronization group mechanism to align multiple data sources based on selected columns. It can be accessed with pw.io.register_input_synchronization_group.
  • pw.io.register_input_synchronization_group now supports the following types of columns: pw.DateTimeUtc, pw.DateTimeNaive, pw.DateTimeDuration, and int.

Changed

  • Enhanced error reporting for runtime errors across most operators, providing a trace that simplifies identifying the root cause.

Fixed

  • Bugfix for problem with list_documents() when no documents present in store.
  • The append-only property of tables created by pw.io.kafka.read is now set correctly.

v0.21.1

28 Mar 11:39

Choose a tag to compare

Changed

  • Input connectors now throttle parsing error messages if their share is more than 10% of the parsing attempts.
  • New flag return_status for inputs_query method in pw.xpacks.llm.DocumentStore. If set to True, DocumentStore returns the status of indexing for each file.

v0.21.0

19 Mar 13:46

Choose a tag to compare

Added

  • All Pathway types can now be serialized to CSV using pw.io.csv.write and deserialized back using pw.io.csv.read.
  • pw.io.csv.read now parses null-values in data when it can be done unambiguously.

Changed

  • BREAKING: Updated endpoints in pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer:
    • Deprecated: /v1/pw_list_documents, /v1/pw_ai_answer
    • New: /v2/list_documents, /v2/answer
  • RAG methods under the pw.xpacks.llm.question_answering.RAGClient are re-named, and they now use the new endpoints. Old methods are deprecated and will be removed in the future.
    • pw_ai_summary -> summarize
    • pw_ai_answer -> answer
    • pw_list_documents -> list_documents
  • When pw.io.deltalake.write creates a table, it also stores its metadata in the columns of the created Delta table. This metadata can be used by Pathway when reading the table with pw.io.deltalake.read if no schema is specified.
  • The schema parameter is now optional for pw.io.deltalake.read. If the table was created by Pathway and the schema was not specified by user, it is read from the table metadata.
  • pw.io.deltalake.write now aligns the output metadata with the existing table's metadata, preserving any custom metadata in the sink.
  • BREAKING: The Bytes type is now serialized and deserialized with base64 encoding and decoding when the CSV format is used.
  • BREAKING: The Duration type is now serialized and deserialized as a number of nanoseconds when the CSV format is used.
  • BREAKING: The tuple and np.ndarray types are now serialized and deserialized as their JSON representations when the CSV format is used.

Fixed

  • pw.io.csv.write now correctly escapes quote characters.

v0.20.1

07 Mar 08:18

Choose a tag to compare

Added

  • Added RecursiveSplitter
  • pw.io.deltalake.write now checks that the schema of the target table Delta Table corresponds to the schema of the Pathway table that is sent for the output. If the schemas differ, a human-readable error message is produced.

v0.20.0

25 Feb 08:10

Choose a tag to compare

[0.20.0] - 2025-02-25

Added

  • Added structure-aware chunking for DoclingParser.
  • Added table_parsing_strategy for DoclingParser.
  • Column expressions as_int(), as_float(), as_str(), and as_bool() now accept additional arguments, unwrap and default, to simplify null handling.
  • Support for python tuples in expressions.

Changed

  • BREAKING: Changed the argument in DoclingParser from parse_images (bool) into image_parsing_strategy (Literal["llm"] | None).
  • BREAKING: doc_post_processors argument in the pw.xpacks.llm.document_store.DocumentStore now longer accepts pw.UDF.
  • Better error messages when using pathway spawn with multiple workers. Now error messages are printed only from the worker experiencing the error directly.

Fixed

  • doc_post_processors argument in the pw.xpacks.llm.document_store.DocumentStore had no effect. This is now fixed.

v0.19.0

20 Feb 13:12

Choose a tag to compare

Added

  • LLMReranker now supports custom prompts as well as custom response parsers allowing for other ranking scales apart from default 1-5.
  • pw.io.kafka.write and pw.io.nats.write now support ColumnReference as a topic name. When a ColumnReference is provided, each message's topic is determined by the corresponding column value.
  • pw.io.python.write accepting ConnectorObserver as an alternative to pw.io.subscribe.
  • pw.io.iceberg.read and pw.io.iceberg.write now support S3 as data backend and AWS Glue catalog implementations.
  • All output connectors now support the sort_by field for ordering output within a single minibatch.
  • A new UDF executor pw.udfs.fully_async_executor. It allows for creation of non-blocking asynchronous UDFs which results can be returned in the future processing time.
  • A Future data type to represent results of fully asynchronous UDFs.
  • pw.Table.await_futures method to wait for results of fully asynchronous UDFs.
  • pw.io.deltalake.write now supports partition columns specification.

Changed

  • BREAKING: Changed the interface of LLMReranker, the use_logit_bias, cache_strategy, retry_strategy and kwargs arguments are no longer supported.
  • BREAKING: LLMReranker no longer inherits from pw.UDF
  • BREAKING: pw.stdlib.utils.AsyncTransformer.output_table now returns a table with columns with Future data type.
  • pw.io.deltalake.read can now read append-only tables without requiring explicit specification of primary key fields.

v0.18.0

07 Feb 16:10

Choose a tag to compare

Added

  • pw.io.postgres.write and pw.io.postgres.write_snapshot now handle serialization of PyObjectWrapper and Timedelta properly.
  • New chunking options in pathway.xpacks.llm.parsers.UnstructuredParser
  • Now all Pathway types can be serialized into JSON and consistently deserialized back.
  • table.col.dt.to_duration converting an integer into a pw.Duration.
  • pw.Json now supports storing datetime and duration type values in ISO format.

Changed

  • BREAKING: Changed the interface of UnstructuredParser
  • BREAKING: The Pointer type is now serialized and deserialized as a string field in Iceberg and Delta Lake.
  • BREAKING: The Bytes type is now serialized and deserialized with base64 encoding and decoding when the JSON format is used. A string field is used to store the encoded contents.
  • BREAKING: The Array type is now serialized and deserialized as an object with two fields: shape denoting the shape of the stored multi-dimensional array and elements denoting the elements of the flattened array.
  • BREAKING: Marked package as py.typed to indicate support for type hints.

Removed

  • BREAKING: Removed undocumented license_key argument from pw.run and pw.run_all methods. Instead, pw.set_license_key should be used.

v0.17.0

31 Jan 12:07

Choose a tag to compare

Added

  • pw.io.iceberg.read method for reading Apache Iceberg tables into Pathway.
  • methods pw.io.postgres.write and pw.io.postgres.write_snapshot now accept an additional argument init_mode, which allows initializing the table before writing.
  • pw.io.deltalake.read now supports serialization and deserialization for all Pathway data types.
  • New parser pathway.xpacks.llm.parsers.DoclingParser supporting parsing of pdfs with tables and images.
  • Output connectors now include an optional name parameter. If provided, this name will appear in logs and monitoring dashboards.
  • Automatic naming for input and output connectors has been enhanced.

Changed

  • BREAKING: pw.io.deltalake.read now requires explicit specification of primary key fields.
  • BREAKING: pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer now returns a dictionary from pw_ai_answer endpoint.
  • pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer allows optionally returning context documents from pw_ai_answer endpoint.
  • BREAKING: When using delay in temporal behavior, current time is updated immediately, not in the next batch.
  • BREAKING: The Pointer type is now serialized to Delta Tables as raw bytes.
  • pw.io.kafka.write now allows to specify key and headers for JSON and CSV data formats.
  • persistent_id parameter in connectors has been renamed to name. This new name parameter allows you to assign names to connectors, which will appear in logs and monitoring dashboards.
  • Changed names of parsers to be more consistent: ParseUnstrutured -> UnstructuredParser, ParseUtf8 -> Utf8Parser. ParseUnstrutured and ParseUtf8 are now deprecated.

Fixed

  • generate_class method in Schema now correctly renders columns of UnionType and None types.
  • a bug in delay in temporal behavior. It was possible to emit a single entry twice in a specific situation.
  • pw.io.postgres.write_snapshot now correctly handles tables that only have primary key columns.

Removed

  • BREAKING: pw.indexing.build_sorted_index, pw.indexing.retrieve_prev_next_values, pw.indexing.sort_from_index and pw.indexing.SortedIndex are removed. Sorting is now done with pw.Table.sort.
  • BREAKING: Removed deprecated methods pw.Table.unsafe_promise_same_universe_as, pw.Table.unsafe_promise_universes_are_pairwise_disjoint, pw.Table.unsafe_promise_universe_is_subset_of, pw.Table.left_join, pw.Table.right_join, pw.Table.outer_join, pw.stdlib.utils.AsyncTransformer.result.
  • BREAKING: Removed deprecated column _pw_shard in the result of windowby.
  • BREAKING: Removed deprecated functions pw.debug.parse_to_table, pw.udf_async, pw.reducers.npsum, pw.reducers.int_sum, pw.stdlib.utils.col.flatten_column.
  • BREAKING: Removed deprecated module pw.asynchronous.
  • BREAKING: Removed deprecated access to functions from pw.io in pw.
  • BREAKING: Removed deprecated classes pw.UDFSync, pw.UDFAsync.
  • BREAKING: Removed class pw.xpack.llm.parsers.OpenParse. It's functionality has been replaced with pw.xpack.llm.parsers.DoclingParser.
  • BREAKING: Removed deprecated arguments from input connectors: value_columns, primary_key, types, default_values. Schema should be used instead.

v0.16.4

09 Jan 15:14

Choose a tag to compare

Fixed

  • Google Drive connector in static mode now correctly displays in jupyter visualizations.

v0.16.3

02 Jan 14:38

Choose a tag to compare

Added

  • pw.io.iceberg.write method for writing Pathway tables into Apache Iceberg.

Changed

  • values of non-deterministic UDFs are not stored in tables that are append_only.
  • pw.Table.ix has better runtime error message that includes id of the missing row.

Fixed

  • temporal behaviors in temporal operators (windowby, interval_join) now consume no CPU when no data passes through them.