Releases: pathwaycom/pathway
Releases · pathwaycom/pathway
v0.14.1
Added
pw.xpacks.llm.embedders.GeminiEmbedderwhich is a wrapper for Google Gemini Embedding services.
v0.14.0
Fixed
pw.debug.table_to_pandasnow exportsint | Nonecolumns correctly.
Changed
pw.io.airbyte.readcan now be used with Airbyte connectors implemented in Python without requiring Docker.- BREAKING: UDFs now verify the type of returned values at runtime. If it is possible to cast a returned value to a proper type, the values is cast. If the value does not match the expected type and can't be cast, an error is raised.
- BREAKING:
pw.reducers.ndarrayreducer requires input column to either have typefloat,intorArray. pw.xpacks.llm.parsers.OpenParsecan now extract and parse images & diagrams from PDFs. This can be enabled by setting theparse_images.processing_pipelinecan be also set to customize the post processing of doc elements.
v0.13.2
Added
pw.io.deltalake.readnow supports S3 data sources.pw.xpacks.llm.parsers.ImageParserwhich allows parsing images with the vision LMs.pw.xpacks.llm.parsers.SlideParserthat enables parsing PDF and PPTX slides with the vision LMs.pw.xpacks.llm.parsers.question_answering.RAGClient, Python client for Pathway hosted RAG apps.pw.xpacks.llm.parsers.question_answeringDeckRetriever, a RAG app that enables searching through slide decks with visual-heavy elements.
Fixed
pw.xpacks.llm.vector_store.VectorStoreServernow uses new indexes.
Changed
pw.xpacks.llm.parsers.OpenParsenow supports any vision Language model including local and propriety models via LiteLLM.
v0.13.1
Added
pw.io.kafka.readnow accepts an autogenerate_key flag. This flag determines the primary key generation policy to apply when reading raw data from the source. You can either use the key from the Kafka message or have Pathway autogenerate one.pw.io.deltalake.readinput connector that fetches changes from DeltaLake into a Pathway table.pw.xpacks.llm.parsers.OpenParsewhich allows parsing tables and images in PDFs.
Fixed
- All S3 input connectors (including S3, Min.io, Digital Ocean, and Wasabi) now automatically retry network operations if a failure occurs.
- The issue where the connection to the S3 source fails after partially ingesting an object has been resolved by downloading the object in full first.
v0.13.0
Added
pw.io.deltalake.writenow supports S3 destinations.
Changed
pw.debug.compute_and_printnow allows passing more than one table.- BREAKING:
pathparameter inpw.io.deltalake.writerenamed touri.
Fixed
- A bug in
pw.Table.deduplicate. Ifpersistent_idis not set, it is no longer generated inpw.PersistenceMode.SELECTIVE_PERSISTINGmode.
v0.12.0
Added
pw.PyObjectWrapperthat enables passing python objects of any type to the engine.cache_strategyoption added forpw.io.http.rest_connector. It enables cache configuration, which is useful for duplicated requests.allow_missesargument toTable.ixandTable.ix_refmethods which allows for filling rows with missing keys with None values.pw.io.deltalake.writeoutput connector that streams the changes of a given table into a DeltaLake storage.pw.io.airbyte.readnow supports data extraction with Google Cloud Runs.
Removed
- BREAKING: Removed
Table.havingmethod. - BREAKING: Removed
pw.DATE_TIME_UTC,pw.DATE_TIME_NAIVEandpw.DURATIONas dtype markers. Instead,pw.DateTimeUtc,pw.DateTimeNaiveandpw.Durationshould be used, which are wrappers for corresponding pandas types. - BREAKING: Removed class transformers from public API:
pw.ClassArg,pw.attribute,pw.input_attribute,pw.input_method,pw.method,pw.output_attributeandpw.transformer. - BREAKING: Removed several methods from
pw.indexingmodule:binsearch_oracle,filter_cmp_helper,filter_smallest_kandprefix_sum_oracle.
v0.11.2
Added
pathway.assert_table_has_schemaandpathway.table_transformernow acceptallow_subtypeargument, which, if True, allows column types in the Table be subtypes of types in the Schema.nextmethod topw.io.python.ConnectorSubject(python connector) that enables passing values of any type to the engine, not only values that are json-serializable. Thenextmethod should be the preferred way of passing values from the python connector.
Changed
- The
formatargument ofpw.io.python.readis deprecated. A data format is inferred from the method used (next_json,next_str,next_bytes) and the provided schema.
Removed
- Removed
pw.numba_applyandnumbadependency.
Fixed
- Fixed
pw.thisdesugaring bug, where__getitem__in.ixcontext was not working properly. pw.io.sqlite.readnow checks if the data matches the passed schema.
v0.11.1
Added
queryandquery_as_of_nowofpathway.stdlib.indexing.data_index.DataIndexnow accept inmetadata_columnparameter a column with data of typestr | None.pathway.xpacks.connectors.sharepointmodule under Pathway for Business License.
v0.11.0
Added
- Embedders in the LLM xpack now have method
get_embedding_dimensionthat returns number of dimension used by the chosen embedder. pathway.stdlib.indexing.nearest_neighbors, with implementations ofpathway.stdlib.indexing.data_index.InnerIndexbased on k-NN via LSH (implemented in Pathway), and k-NN provided by USearch library.pathway.stdlib.indexing.vector_document_index, with a few predefined instances ofpathway.stdlib.indexing.data_index.DataIndex.pathway.stdlib.indexing.bm25, with implementations ofpathway.stdlib.indexing.data_index.InnerIndexbased on BM25 index provided by Tantivy.pathway.stdlib.indexing.full_text_document_index, with a predefined instance ofpathway.stdlib.indexing.data_index.DataIndex.- Introduced the
rerankermodule underllm.xpacks. Includes few re-ranking strategies and utility functions for RAG applications.
Changed
- BREAKING:
windowbygenerates IDs of produced rows differently than in the previous version. - BREAKING:
pw.io.csv.writeprints printable non-ascii characters as regular text, not\u{xxxx}. - BREAKING: Connector methods
pw.io.elasticsearch.read,pw.io.debezium.read,pw.io.fs.read,pw.io.jsonlines.read,pw.io.kafka.read,pw.io.python.read,pw.io.redpanda.read,pw.io.s3.readnow check the type of the input data. Previously it was not checked if the provided format was"json"/"jsonlines". If the data is inconsistent with the provided schema, the row is skipped and the error message is emitted. - BREAKING:
queryandquery_as_of_nowmethods ofpathway.stdlib.indexing.data_index.DataIndexnow returnpathway.JoinResult, to allow resolving column name conflicts (between columns in the table with queries and table with index data). - BREAKING: DataIndex methods
queryandquery_as_of_nownow return score in a column named_pw_index_reply_score(defined as_SCOREvariable inpathway.stdlib.indexing.colnames.py).
Removed
- BREAKING:
pathway.stdlib.indexing.data_index.VectorDocumentIndexclass, some predefined instances are now meant to be obtained via methods provided inpathway.stdlib.indexing.vector_document_index. - BREAKING:
with_distancesparameter ofqueryandquery_as_of_nowmethods inpathway.stdlib.indexing.data_index.DataIndex. Instead of 'distance', we now operate with a more general term 'score' (higher = better). For distance based indices score is usually defined as negative distance. Score is now always included in the answer, as long as underlying index returns something that indicates quality of a match.
v0.10.1
Added
querymethod to VectorStoreServer to enable compatible API withDataIndex.AdaptiveRAGQuestionAnswererto xpacks.question_answering. End-to-end pipeline and accompanying code forPrivate RAGshowcase.