Releases: pathwaycom/pathway
Releases · pathwaycom/pathway
v0.21.2
Added
- Added synchronization group mechanism to align multiple data sources based on selected columns. It can be accessed with
pw.io.register_input_synchronization_group. pw.io.register_input_synchronization_groupnow supports the following types of columns:pw.DateTimeUtc,pw.DateTimeNaive,pw.DateTimeDuration, andint.
Changed
- Enhanced error reporting for runtime errors across most operators, providing a trace that simplifies identifying the root cause.
Fixed
- Bugfix for problem with list_documents() when no documents present in store.
- The append-only property of tables created by
pw.io.kafka.readis now set correctly.
v0.21.1
Changed
- Input connectors now throttle parsing error messages if their share is more than 10% of the parsing attempts.
- New flag
return_statusforinputs_querymethod inpw.xpacks.llm.DocumentStore. If set to True, DocumentStore returns the status of indexing for each file.
v0.21.0
Added
- All Pathway types can now be serialized to CSV using
pw.io.csv.writeand deserialized back usingpw.io.csv.read. pw.io.csv.readnow parses null-values in data when it can be done unambiguously.
Changed
- BREAKING: Updated endpoints in
pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer:- Deprecated:
/v1/pw_list_documents,/v1/pw_ai_answer - New:
/v2/list_documents,/v2/answer
- Deprecated:
- RAG methods under the
pw.xpacks.llm.question_answering.RAGClientare re-named, and they now use the new endpoints. Old methods are deprecated and will be removed in the future.pw_ai_summary->summarizepw_ai_answer->answerpw_list_documents->list_documents
- When
pw.io.deltalake.writecreates a table, it also stores its metadata in the columns of the created Delta table. This metadata can be used by Pathway when reading the table withpw.io.deltalake.readif noschemais specified. - The
schemaparameter is now optional forpw.io.deltalake.read. If the table was created by Pathway and theschemawas not specified by user, it is read from the table metadata. pw.io.deltalake.writenow aligns the output metadata with the existing table's metadata, preserving any custom metadata in the sink.- BREAKING: The
Bytestype is now serialized and deserialized with base64 encoding and decoding when the CSV format is used. - BREAKING: The
Durationtype is now serialized and deserialized as a number of nanoseconds when the CSV format is used. - BREAKING: The
tupleandnp.ndarraytypes are now serialized and deserialized as their JSON representations when the CSV format is used.
Fixed
pw.io.csv.writenow correctly escapes quote characters.
v0.20.1
Added
- Added
RecursiveSplitter pw.io.deltalake.writenow checks that the schema of the target table Delta Table corresponds to the schema of the Pathway table that is sent for the output. If the schemas differ, a human-readable error message is produced.
v0.20.0
[0.20.0] - 2025-02-25
Added
- Added structure-aware chunking for
DoclingParser. - Added
table_parsing_strategyforDoclingParser. - Column expressions
as_int(),as_float(),as_str(), andas_bool()now accept additional arguments,unwrapanddefault, to simplify null handling. - Support for python tuples in expressions.
Changed
- BREAKING: Changed the argument in
DoclingParserfromparse_images(bool) intoimage_parsing_strategy(Literal["llm"] | None). - BREAKING:
doc_post_processorsargument in thepw.xpacks.llm.document_store.DocumentStorenow longer acceptspw.UDF. - Better error messages when using
pathway spawnwith multiple workers. Now error messages are printed only from the worker experiencing the error directly.
Fixed
doc_post_processorsargument in thepw.xpacks.llm.document_store.DocumentStorehad no effect. This is now fixed.
v0.19.0
Added
LLMRerankernow supports custom prompts as well as custom response parsers allowing for other ranking scales apart from default 1-5.pw.io.kafka.writeandpw.io.nats.writenow supportColumnReferenceas a topic name. When aColumnReferenceis provided, each message's topic is determined by the corresponding column value.pw.io.python.writeacceptingConnectorObserveras an alternative topw.io.subscribe.pw.io.iceberg.readandpw.io.iceberg.writenow support S3 as data backend and AWS Glue catalog implementations.- All output connectors now support the
sort_byfield for ordering output within a single minibatch. - A new UDF executor
pw.udfs.fully_async_executor. It allows for creation of non-blocking asynchronous UDFs which results can be returned in the future processing time. - A Future data type to represent results of fully asynchronous UDFs.
pw.Table.await_futuresmethod to wait for results of fully asynchronous UDFs.pw.io.deltalake.writenow supports partition columns specification.
Changed
- BREAKING: Changed the interface of
LLMReranker, theuse_logit_bias,cache_strategy,retry_strategyandkwargsarguments are no longer supported. - BREAKING: LLMReranker no longer inherits from pw.UDF
- BREAKING:
pw.stdlib.utils.AsyncTransformer.output_tablenow returns a table with columns with Future data type. pw.io.deltalake.readcan now read append-only tables without requiring explicit specification of primary key fields.
v0.18.0
Added
pw.io.postgres.writeandpw.io.postgres.write_snapshotnow handle serialization ofPyObjectWrapperandTimedeltaproperly.- New chunking options in
pathway.xpacks.llm.parsers.UnstructuredParser - Now all Pathway types can be serialized into JSON and consistently deserialized back.
table.col.dt.to_durationconverting an integer into apw.Duration.pw.Jsonnow supports storing datetime and duration type values in ISO format.
Changed
- BREAKING: Changed the interface of
UnstructuredParser - BREAKING: The
Pointertype is now serialized and deserialized as a string field in Iceberg and Delta Lake. - BREAKING: The
Bytestype is now serialized and deserialized with base64 encoding and decoding when the JSON format is used. A string field is used to store the encoded contents. - BREAKING: The
Arraytype is now serialized and deserialized as an object with two fields:shapedenoting the shape of the stored multi-dimensional array andelementsdenoting the elements of the flattened array. - BREAKING: Marked package as py.typed to indicate support for type hints.
Removed
- BREAKING: Removed undocumented
license_keyargument frompw.runandpw.run_allmethods. Instead,pw.set_license_keyshould be used.
v0.17.0
Added
pw.io.iceberg.readmethod for reading Apache Iceberg tables into Pathway.- methods
pw.io.postgres.writeandpw.io.postgres.write_snapshotnow accept an additional argumentinit_mode, which allows initializing the table before writing. pw.io.deltalake.readnow supports serialization and deserialization for all Pathway data types.- New parser
pathway.xpacks.llm.parsers.DoclingParsersupporting parsing of pdfs with tables and images. - Output connectors now include an optional
nameparameter. If provided, this name will appear in logs and monitoring dashboards. - Automatic naming for input and output connectors has been enhanced.
Changed
- BREAKING:
pw.io.deltalake.readnow requires explicit specification of primary key fields. - BREAKING:
pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerernow returns a dictionary frompw_ai_answerendpoint. pw.xpacks.llm.question_answering.BaseRAGQuestionAnswererallows optionally returning context documents frompw_ai_answerendpoint.- BREAKING: When using delay in temporal behavior, current time is updated immediately, not in the next batch.
- BREAKING: The
Pointertype is now serialized to Delta Tables as raw bytes. pw.io.kafka.writenow allows to specifykeyandheadersfor JSON and CSV data formats.persistent_idparameter in connectors has been renamed toname. This newnameparameter allows you to assign names to connectors, which will appear in logs and monitoring dashboards.- Changed names of parsers to be more consistent:
ParseUnstrutured->UnstructuredParser,ParseUtf8->Utf8Parser.ParseUnstruturedandParseUtf8are now deprecated.
Fixed
generate_classmethod inSchemanow correctly renders columns ofUnionTypeandNonetypes.- a bug in delay in temporal behavior. It was possible to emit a single entry twice in a specific situation.
pw.io.postgres.write_snapshotnow correctly handles tables that only have primary key columns.
Removed
- BREAKING:
pw.indexing.build_sorted_index,pw.indexing.retrieve_prev_next_values,pw.indexing.sort_from_indexandpw.indexing.SortedIndexare removed. Sorting is now done withpw.Table.sort. - BREAKING: Removed deprecated methods
pw.Table.unsafe_promise_same_universe_as,pw.Table.unsafe_promise_universes_are_pairwise_disjoint,pw.Table.unsafe_promise_universe_is_subset_of,pw.Table.left_join,pw.Table.right_join,pw.Table.outer_join,pw.stdlib.utils.AsyncTransformer.result. - BREAKING: Removed deprecated column
_pw_shardin the result ofwindowby. - BREAKING: Removed deprecated functions
pw.debug.parse_to_table,pw.udf_async,pw.reducers.npsum,pw.reducers.int_sum,pw.stdlib.utils.col.flatten_column. - BREAKING: Removed deprecated module
pw.asynchronous. - BREAKING: Removed deprecated access to functions from
pw.ioinpw. - BREAKING: Removed deprecated classes
pw.UDFSync,pw.UDFAsync. - BREAKING: Removed class
pw.xpack.llm.parsers.OpenParse. It's functionality has been replaced withpw.xpack.llm.parsers.DoclingParser. - BREAKING: Removed deprecated arguments from input connectors:
value_columns,primary_key,types,default_values. Schema should be used instead.
v0.16.4
Fixed
- Google Drive connector in static mode now correctly displays in jupyter visualizations.
v0.16.3
Added
pw.io.iceberg.writemethod for writing Pathway tables into Apache Iceberg.
Changed
- values of non-deterministic UDFs are not stored in tables that are
append_only. pw.Table.ixhas better runtime error message that includes id of the missing row.
Fixed
- temporal behaviors in temporal operators (
windowby,interval_join) now consume no CPU when no data passes through them.