Releases: Eventual-Inc/Daft
Releases · Eventual-Inc/Daft
v0.7.4
What's Changed 🚀
💥 Breaking Changes
- refactor(arrow2)!: remaining arrow2 from daft-core @universalmind303 (#6284)
- refactor(arrow2)!: use arrow-rs casting for strftime function @universalmind303 (#6263)
- refactor(arrow2)!: migrate interval arithmetic to arrow-rs @rohitkulshreshtha (#6186)
✨ Features
- feat(ai): Add HTTP URL direct passthrough for images and videos in prompt function @huleilei (#6182)
- feat(sql): add DATE_TRUNC function support @desmondcheongzx (#6258)
- feat(swordfish): Streaming sources @colin-ho (#5978)
- feat(metrics): add metrics docs @cckellogg (#6253)
- feat(io/av): enhance time-interval sampling with comprehensive tests and improved @huleilei (#6088)
- feat: Add Tencent Cloud COS (Cloud Object Storage) support @XuQianJin-Stars (#6140)
- feat(metrics): consolidate naming and add node.type attribute @cckellogg (#6236)
- feat: add Flight shuffle to Flotilla @srilman (#6123)
- feat: Apache OpenDAL™ compatible backends @universalmind303 (#6177)
- feat(observability): Split duration into separate column in metrics DF @srilman (#6235)
- feat: add support for pyiceberg 0.11.0 @gweaverbiodev (#6200)
- feat: add support for SQL ORDER BY column position @Lucas61000 (#6211)
- feat: Supports running dashboard in daemon mode @plotor (#5993)
- feat: json_write support for time stamps @gpathak128 (#6214)
- feat:
.as_Tcast methods @aaron-ang (#6100)
🐛 Bug Fixes
- fix: handle case where join keys are different for sort-merge multi-partition join @gweaverbiodev (#6243)
- fix(arrow2): reinterpret physical array as logical type after cast fallback @desmondcheongzx (#6291)
- fix(sql): resolve GROUP BY ambiguous column names for derived expressions @desmondcheongzx (#6286)
- fix(flight): Add a check for
flight_shuffle_dirsarg and change default @srilman (#6266) - fix: Map Literal <-> Python Dict conversion @w2ais (#6084)
- fix: NaN-aware comparator for multi-column search_sorted and sort @desmondcheongzx (#6242)
- fix: add ignore_empty_and_null parameter to
.explode()@singularityDLW (#6047) - fix: Broadcast literal expressions in aggregations to match input length @desmondcheongzx (#6155)
- fix: Cleanup imports from dashboard daemon PR @srilman (#6222)
- fix: Fix compilation on main @srilman (#6221)
- fix: canonicalize negative NaN in multi-column sort comparator @ykdojo (#6215)
- fix: delta version parsing @aaron-ang (#6156)
- fix: Register extension type before reading from Lance @ykdojo (#6058)
- fix: use union instead of append_column in window agg to fix schema mismatch @ykdojo (#6178)
- fix: into_batches should not allow downstream shuffle elision @desmondcheongzx (#6170)
♻️ Refactor
- refactor(arrow2): refactor index_bitmap and time_unit @universalmind303 (#6287)
- refactor(arrow2)!: remaining arrow2 from daft-core @universalmind303 (#6284)
- refactor(arrow2): remove series::try_from(name, arrow2_arr) @universalmind303 (#6283)
- refactor(arrow2): switch to arrow-rs backed arrays @universalmind303 (#6280)
- refactor(arrow2): misc deprecation warnings @universalmind303 (#6270)
- refactor(arrow2): remove from_arrow2 @universalmind303 (#6267)
- refactor(arrow2): fully remove toarrow2 from arrays & series @universalmind303 (#6265)
- refactor(arrow2)!: use arrow-rs casting for strftime function @universalmind303 (#6263)
- refactor(arrow2): use arrow-rs for filtering on python arrays @universalmind303 (#6262)
- refactor(arrow2): migrate cast.rs from arrow2 to arrow-rs @desmondcheongzx (#6239)
- refactor(arrow2): migrate BooleanArray bitmap access to arrow-rs @desmondcheongzx (#6256)
- refactor(arrow2): migrate concat.rs to arrow-rs @desmondcheongzx (#6255)
- refactor(arrow2): remove misc arrow2 references in csv and json @universalmind303 (#6248)
- refactor(arrow2): remove more arrow2 usages in daft-core @universalmind303 (#6249)
- refactor(arrow2): remove arrow2 from daft-recordbatch @universalmind303 (#6231)
- refactor(arrow2): replace arrow2 based buffer with custom impl @universalmind303 (#6247)
- refactor(arrow2)!: migrate interval arithmetic to arrow-rs @rohitkulshreshtha (#6186)
- refactor(arrow2): remove daft_arrow from arrow_growable @universalmind303 (#6251)
- refactor(arrow2): migrate len.rs to arrow-rs @rohitkulshreshtha (#6171)
- refactor(arrow2): migrate image.rs to arrow-rs @rohitkulshreshtha (#6191)
- refactor(arrow2): migrate dyn_compare and probeable to arrow-rs @desmondcheongzx (#6227)
- refactor(arrow2): migrate if_else kernel from arrow2 to arrow-rs @desmondcheongzx (#6240)
- refactor(arrow2): migrate array serdes from arrow2 to arrow-rs @desmondcheongzx (#6238)
- refactor(arrow2): migrate growable internals from arrow2 to arrow-rs @desmondcheongzx (#6228)
- refactor(arrow-rs): Remove basic growable usages @srilman (#5779)
- refactor(arrow2): misc arrow2 cleanups in daft-core @universalmind303 (#6232)
- refactor(arrow2): remove daft_arrow from daft-functions-utf8 @universalmind303 (#6229)
- refactor(arrow2): fully remove daft-arrow from functions-list @universalmind303 (#6230)
- refactor(arrow2): migrates hashing kernel from arrow2 to arrow-rs @universalmind303 (#6166)
- refactor(arrow2): replace arrow2 iterators with custom daft-native iterators @universalmind303 (#6220)
- refactor(arrow2): list kernels @universalmind303 (#6219)
- refactor(arrow2): migrate groups to arrow-rs @cckellogg (#6185)
- refactor(arrow2): second attempt at offsets @universalmind303 (#6162)
- refactor(arrow2): update image_array to not use arrow2 @universalmind303 (#6217)
- refactor(arrow2): Migrate arithmetic kernels @desmondcheongzx (#6193)
- refactor(arrow2): Migrate concat agg @desmondcheongzx (#6190)
📖 Documentation
- docs: Fix broken links to the url modality and various datatypes @desmondcheongzx (#6212)
✅ Tests
- test(benchmarking): add some read_json python benchmarks @universalmind303 (#6288)
- test(parquet): add benchmarks for nested types, codecs, and filter pushdown @desmondcheongzx (#6285)
- test: Filter null bytes from generated column names in property-based tests @desmondcheongzx (#6213)
- test(postmerge): Minor fix to OpenAI integration tests @desmondcheongzx (#6209)
- test: Fix incomplete metrics migration in OpenAI integration tests @desmondcheongzx (#6204)
👷 CI
- ci: include dashboard assets in maturin sdist for manylinux builds @desmondcheongzx (#6281)
- ci: fix restore-mtime exit code when last file is deleted on macos runners @desmondcheongzx (#6268)
- ci: cache workspace crates and share rust caches across all workflows @desmondcheongzx (#6264)
- ci: cache workspace crate artifacts in integration build @desmondcheongzx (#6261)
- ci: share rust cache across branches and restore file mtimes @desmondcheongzx (#6246)
- ci: bump all timeouts from 30 -> 45 @universalmind303 (#6250)
- ci: increase integration-test-build timeout from 45 to 90 minutes @desmondcheongzx (#6244)
- ci: increase integration-test-build timeout from 30 to 45 minutes @desmondcheongzx (#6241)
- ci: increase integration-test-sql timeout from 30 to 45 minutes @desmondcheongzx (#6237)
- ci: Reduce xdist workers on macOS to fix Ray actor timeout @desmondcheongzx (#6210)
- ci: make repository secrets optional for CI workflows @jeevb (#6199)
- ci: Improve CI reliability via disk space reclamation and coverage optimization @desmondcheongzx (#6205)
🔧 Maintenance
- chore: Update codeowners @colin-ho (#6183)
- chore: Adding Issue requirement for PRs, Updating maintainers @madvart (#6196)
- chore(observability): Split dashboard cli into separate start / stop subcommands @srilman (#6234)
- chore(deps): Resolve Dependabot security alerts @desmondcheongzx (#6226)
- chore: address review feedback from #6208 @desmondcheongzx (#6225)
- chore(deps): Bump dependency group with conflict resolution @desmondcheongzx (#6208)
- chore: Replace Bun with Node / NPM @srilman (#6202)
- chore(arrow2): simplify deprecation markers @universalmind303 (#6218)
- chore: add .values method for utf8Array @universalmind303 (#6216)
- chore: add mypy-boto3-glue to aws optional dependencies @Killua7163 (#6080)
Full Changelog: v0.7.3...v0.7.4
v0.7.3
What's Changed 🚀
✨ Features
- feat(observability): Support exporting Flotilla metrics @srilman (#6122)
- feat: Nightly installations under nightly.daft.ai @desmondcheongzx (#6175)
- feat: Support additional
OTEL_*configuration envs @srilman (#6148) - feat(lance): Support lance namespace read and write @shaofengshi (#5980)
- feat: add unity oauth m2m access token support @cckellogg (#5839)
- feat: Support snapshot properties for Iceberg writes @desmondcheongzx (#6139)
- feat: delimiter for
agg_concat@aaron-ang (#6099) - feat: add uuid function @everySympathy (#5983)
- feat: add custom date and timestamp formatting for CSV writes @madvart (#6073)
- feat(observability): Export metrics out as a table with the result @srilman (#6055)
- feat: comparison ops for list and struct types @aaron-ang (#6104)
- feat(frontend): enhance dashboard UI and fix Ray runner state reporting @Jay-ju (#6063)
- feat(dashboard): backend implementation @Jay-ju (#6062)
- feat: string casing functions @aaron-ang (#6096)
- feat:
list_containsexpression @aaron-ang (#6095) - feat: specify text embedding dim @aaron-ang (#6097)
- feat: Expr.var with ddof @aaron-ang (#6105)
- feat(lance): add nearest vector search support @huleilei (#6025)
- feat: distance and similarity functions @aaron-ang (#6098)
- feat: add gravitino connector optional dependency @shaofengshi (#6083)
- feat: Add support for ignoring null fields when writing json @gpathak128 (#6049)
🐛 Bug Fixes
- fix(docs): Fix broken links in modalities documentation @everettVT (#6197)
- fix: Render Map columns as python dicts instead of list[struct] in terminal prints @srilman (#6198)
- fix: Fix merge conflict with OTEL configuration PR @srilman (#6187)
- fix(observability): Truncate progress bar names by characters, not bytes @desmondcheongzx (#6180)
- fix: fix concat.rs @colin-ho (#6174)
- fix(test): bump actor UDF timeout from 10s to 60s to reduce flakiness @ykdojo (#6163)
- fix(optimizer): Fix bug with pushing filters through anti-joinsI s @desmondcheongzx (#6150)
- fix: Allow
is_into accept sets, tuples, and other iterables @desmondcheongzx (#6115) - fix(ci): restore Rust code coverage by pinning cargo-llvm-cov @desmondcheongzx (#6146)
- fix: make df.into_partitions() work when input num == num_partitions @everySympathy (#6061)
- fix(udf): ensure per-call kwargs in udf v2 are uniquely bound per call site @huleilei (#6079)
- fix(rustfmt): ignore parquet directories @aaron-ang (#6101)
- fix: No need extra 1 for
OffsetBufferBuilder@colin-ho (#6057) - fix: use unique bucket names for running tests in parallel @rchowell (#6052)
🚀 Performance
- perf: zero copy on
from_vec@universalmind303 (#6172) - perf: Only Serialize Required Cols in Actor UDFs @plotor (#5884)
♻️ Refactor
- refactor(arrow2): migrate time.rs temporal methods to arrow-rs @rohitkulshreshtha (#6160)
- refactor(arrow2): migrate null checks to arrow-rs @rohitkulshreshtha (#6152)
- refactor(arrow2): migrates sparse-tensor to arrow-rs @cckellogg (#6179)
- refactor(swordfish): Separate concat and make streaming sink single input @colin-ho (#6059)
- refactor(arrow2): migrates hll-sketch to arrow-rs @cckellogg (#6169)
- refactor(arrow2): use arrow-rs for python conversions @universalmind303 (#6130)
- refactor: Migrate hll_merge.rs from arrow2 to arrow-rs @rohitkulshreshtha (#6158)
- refactor(arrow2): remove arrow2 from minhash @cckellogg (#6149)
- refactor: Migrate product.rs from arrow2 to arrow-rs @rohitkulshreshtha (#6159)
- refactor(arrow2): migrates hashing kernel from arrow2 to arrow-rs @rchowell (#6056)
- refactor(arrow2): migrate the extra easy kernels to arrow-rs @universalmind303 (#6145)
- refactor(arrow2): migrate BinaryArray iceberg_truncate to arrow-rs @rohitkulshreshtha (#6157)
- refactor(arrow2): migrate DataArray full_null/empty to arrow-rs @rohitkulshreshtha (#6151)
- refactor(arrow2): remove a bunch of from impls that used arrow2 @universalmind303 (#6137)
- refactor(scalarudf): followup to uuid pr @universalmind303 (#6129)
- refactor(arrow-rs): Remove arrow2 from the search_sorted kernel @desmondcheongzx (#6034)
- refactor(arrow-rs): Migrate approx count distinct @desmondcheongzx (#6038)
- refactor(arrow-rs): migrate is_in and get_lit ops to Arrow-rs arrays @huleilei (#6085)
- refactor: Migrate utf8 left function to arrow-rs @huleilei (#6004)
- refactor(arrow2): migrate daft-functions-utf8/split to arrow-rs @colin-ho (#6046)
- refactor(arrow-rs): Migrate sketch_percentile kernel @desmondcheongzx (#6044)
- refactor(swordfish): Separate joins @colin-ho (#6042)
- refactor(arrow2): remove some arrow2 based from impls @universalmind303 (#6054)
- refactor(arrow2): migrate sum, min, max agg kernels @kevinzwang (#6045)
- refactor(arrow2): utf8 comparison kernels @kevinzwang (#6032)
- refactor(flotilla): Use
stream::iterinstead of channel in flotilla source nodes @colin-ho (#6043) - refactor(arrow2): remove arrow2 *_array() methods from ImageArray @universalmind303 (#6050)
- refactor(arrow-rs): Migrate binary from_iter methods @desmondcheongzx (#6040)
- refactor(observability): Refactor StatSnapshot to be predefined structs @srilman (#6033)
📖 Documentation
- docs: Add daft.File usage throughout modalities @everettVT (#6074)
- docs: Governance proposed changes @jaychia (#6117)
- docs: Clarify imagenet benchmark setup @desmondcheongzx (#6147)
- docs: add TosConfig documentation reference in config.md @huleilei (#6068)
- docs: Update daft.File API and add docstrings, improve error handling in tests @everettVT (#5877)
- docs: add mm structured outputs tutorial @everettVT (#5816)
- docs: Add end-to-end image pipeline example and regression test @huleilei (#6006)
✅ Tests
- test: Add missing result parameter to on_query_end in Google AI test @desmondcheongzx (#6184)
👷 CI
- ci: Exclude common-arrow-ffi from rust tests @desmondcheongzx (#6173)
- ci: Accept pandas StringDtype in schema override tests @desmondcheongzx (#6165)
- ci: Add pyarrow to wheel build test dependencies @desmondcheongzx (#6109)
- ci: Add pytz and numpy to wheel build test dependencies @desmondcheongzx (#6108)
- ci: Fix nightly workflow permissions @desmondcheongzx (#6037)
🔧 Maintenance
- chore: Intermediate op single input @colin-ho (#6189)
- chore: Revert "refactor(arrow2): migrates hashing kernel from arrow2 to arrow-rs" @universalmind303 (#6164)
- chore: Remove
ci/folder @srilman (#6154) - chore: agg_concat kernel nits @colin-ho (#6136)
- chore: Remove vendored parquet-format-safe, use upstream patched version @desmondcheongzx (#6118)
- chore(deps): bump the all group with 5 updates @dependabot[bot] (#5908)
- chore: Allows
make cleanto skip cleaning the python virtual environment @plotor (#6103) - chore: correct comment of enable_scan_task_split_and_merge in set_execution_config @everySympathy (#6077)
- chore: Update lru dependency @desmondcheongzx (#6041)
⬆️ Dependencies
- chore(deps): bump the all group with 5 updates @dependabot[bot] (#5908)
Full Changelog: v0.7.2...v0.7.3
v0.7.2
What's Changed 🚀
✨ Features
- feat: Add name and path properties to daft.File @everettVT (#6024)
- feat(mcap): support topic_start_time_resolver and raw-bytes non-seekable reader @Jay-ju (#5886)
- feat: Add configurable token limits to OpenAI text embedder @kyo-tom (#6017)
- feat: Add guess_mime_type scalar expression for MIME type detection from bytes @copilot-swe-agent[bot] (#5883)
- feat: Channel-less intermediate op @colin-ho (#5999)
- feat: support dropping dimensions in /v1/embeddings requests @rchowell (#5988)
- feat: Add Apache Gravitino virtual file system (gvfs://) write support in io module @shaofengshi (#5965)
- feat: async embed image @colin-ho (#5833)
- feat: support delimiter, quota, header options for csv writer @stayrascal (#5794)
- feat(agg): support map_groups with v2 udf @Jay-ju (#5927)
- feat:
index_coloption inexplode@aaron-ang (#5842) - feat: support resumable stream @stayrascal (#5824)
- feat(functions): add shell_op for distributed shell execution @huleilei (#5738)
- feat: Add Apache Gravitino virtual file system (gvfs://) read support in io module @shaofengshi (#5766)
- feat: Support configuring Conda Env for Class UDFs in Flotilla @plotor (#5117)
- feat: image to tensor @aaron-ang (#5847)
- feat: support native writer via tos @stayrascal (#5760)
🐛 Bug Fixes
- fix(observability): Clean up progress bar naming @srilman (#6028)
- fix(video): correct keyframe seek timestamp calculation for start_time @huleilei (#6005)
- fix: ".*" not handled correctly in SQL planner @Lucas61000 (#5784)
- fix: Optimize the small files issue of sink lance @caican00 (#5844)
- fix: overriding dimensions for openai embedding models @kevinzwang (#6013)
- fix: support externally-hosted models via OpenAI-compatible API when using embed_text func @caican00 (#5873)
- fix: respect model dtype when overriding embedding dimensions @fenfeng9 (#5899)
- fix: Pass csv option into native writer @colin-ho (#6003)
- fix: Nonzero morsel upper bound @colin-ho (#5989)
- fix: Clean up UDF display name in progress bar and plans @srilman (#5810)
- fix(cast): handle whitespace in string-to-number casting @ykdojo (#5955)
- fix: Incorrect buffer pool calculation strategy when reading CSV @plotor (#5857)
- fix: Optimize the display information of Join nodes in query plan @plotor (#5617)
- fix: Fast failure when dashboard is enabled in Ray Runner @plotor (#5867)
- fix(ai): resolve intermittent meta tensor error in classify_text/classify_image @rohitkulshreshtha (#5977)
- fix(ci): cargo machete error that slipped through ci somehow @universalmind303 (#5975)
- fix: Daft.ai link checker to ignore X @everettVT (#5879)
- fix: Supporting fractional gpu count on class udf @caican00 (#5840)
- fix: remove flaky datasets from read_huggingface tests @everettVT (#5926)
- fix: allows appending nulls to lists, null is compatible with all types @rchowell (#5921)
- fix(test): read all splits in HuggingFace integration tests @ykdojo (#5878)
- fix(iceberg): Correct test setup to ensure delete files are created @huleilei (#5864)
- fix(ray): namespace flotilla actor per job to avoid plan id collisions @huleilei (#5855)
🚀 Performance
- perf: dont eval empty recordbatches @universalmind303 (#5968)
♻️ Refactor
- refactor(swordfish): Channel-less blocking sink @colin-ho (#6023)
- refactor(arrow2): rename validity to nulls to align with arrow-rs @universalmind303 (#6027)
- refactor(arrow-rs): remove usages of build_is_equal and replace with … @universalmind303 (#6018)
- refactor(arrow2): arrow based take kernels @universalmind303 (#6022)
- refactor(swordfish): Channel-less streaming sink @colin-ho (#6021)
- refactor(arrow2): remove makegrowable from fsl array @universalmind303 (#6019)
- refactor(flotilla): Swordfish task builder @colin-ho (#5976)
- refactor(arrow-rs): remove makegrowable from concat_agg.rs @universalmind303 (#6020)
- refactor(arrow2): refactor build_probe_table_without_nulls @universalmind303 (#6011)
- refactor(arrow2): rest of comparison.rs @kevinzwang (#6000)
- refactor: abstract TosRetrier to retry all tos operation @stayrascal (#5858)
- refactor(arrow2): functions-utf8 utils @universalmind303 (#5996)
- refactor(arrow2): parquet/read @universalmind303 (#5997)
- refactor(arrow2): remove some arrow2 based from impls @universalmind303 (#5995)
- refactor(arrow2): migrate apply and binaryapply functions @universalmind303 (#5994)
- refactor(arrow2): DaftCompare between two DataArrays @kevinzwang (#5964)
- refactor(arrow-rs): Remove arrow2 from daft-writers @srilman (#5985)
- refactor(arrow2): add new from_iter_values and arange impl @universalmind303 (#5984)
- refactor(arrow-rs): Remove arrow2 from daft-scan and related @srilman (#5974)
- refactor(arrow-rs): Remove arrow2 from sort kernels @desmondcheongzx (#5963)
- refactor(arrow-rs): Upgrade arrow-rs to 57.1.0 @srilman (#5969)
- refactor(arrow2): migrate float.rs to arrow-rs @rohitkulshreshtha (#5953)
- refactor(arrow2): use arrow for file arrays @universalmind303 (#5972)
- refactor(arrow2): count and array/mod @universalmind303 (#5973)
- refactor(arrow-rs): Remove arrow2 from daft-sketch @srilman (#5967)
- refactor(arrow2): migrate repeat.rs to arrow-rs @universalmind303 (#5928)
- refactor(arrow2): migrate left.rs to arrow-rs @universalmind303 (#5930)
- refactor(arrow2): migrate daft-image/src/ops.rs to arrow-rs @universalmind303 (#5940)
- refactor(arrow2): migrate daft-image/series.rs to arrow-rs @universalmind303 (#5939)
- refactor(arrow2): migrate endswith.rs to arrow-rs @universalmind303 (#5933)
- refactor(arrow2): migrate replace.rs to arrow-rs @universalmind303 (#5932)
- refactor(arrow2): migrate find.rs to arrow-rs @universalmind303 (#5929)
- refactor(arrow2): recordbatch ops @kevinzwang (#5962)
- refactor(arrow-rs): Remove arrow2 from WARC reader @srilman (#5948)
- refactor(arrow2): bool_agg @kevinzwang (#5959)
- refactor(arrow2): migrate functions-utf8 repeat, replace, to_datetime @cckellogg (#5961)
- refactor(arrow2): migrate functions-utf8 to_date.rs @cckellogg (#5960)
- refactor(arrow-rs): Remove arrow2 use in jq kernel @colin-ho (#5957)
- refactor(arrow2): migrate functions-utf8 substr.rs @cckellogg (#5958)
- refactor(arrow-rs): remove arrow2 from PartitionedWriter @cckellogg (#5951)
- refactor(arrow2): binary kernels @kevinzwang (#5956)
- refactor(arrow-rs): use arrow-rs for delete map @colin-ho (#5950)
- refactor(arrow2): Remove arrow2 from utf8 array ops @desmondcheongzx (#5954)
- refactor(arrow2): Migrate utf8.right to use arrow-rs instead of arrow2 @huleilei (#5889)
- refactor(arrow-rs): Move IPC conversion from arrow2 to arrow-rs @srilman (#5805)
- refactor(arrow2): migrate streaming_sink/vllm @universalmind303 (#5922)
- refactor(arrow2): add more deprecation markers @universalmind303 (#5917)
- refactor(arrow2): array & series to/from arrow @universalmind303 (#5848)
📖 Documentation
- docs: new section for openai compatible providers @everettVT (#5748)
- docs: add tos config and a specific schema example in write lance @huleilei (#5992)
✅ Tests
- test: Parametrize limit offset tests @colin-ho (#5971)
- test: Don't skip all pyarrow 8.0.0 tests @colin-ho (#5944)
- test: strip whitespace from process output for run_process test @kevinzwang (#5942)
🔧 Maintenance
- chore(observability): Refactor progress bar to remove RuntimeStatsSubscriber @srilman (#6030)
- chore: Use tokio channel for swordfish channels @colin-ho (#6035)
- chore: optimizes slow tests in CI/CD @rchowell (#6029)
- chore: Add workflow permissions @desmondcheongzx (#6014)
- chore: Upgrade nextjs @desmondcheongzx (#6015)
- chore: Update dependencies for dependabot alerts @desmondcheongzx (#6002)
- chore: uv lock check @colin-ho (#6010)
- chore: add some additional iterator methods for daft arrays @universalmind303 (#5937)
- chore: add deprecation warnings on other arrow2 arrays @kevinzwang (#5946)
- chore: ignore all markdown files inside .claude @universalmind303 (#5943)
- chore: ignore all prompting agents inside .claude dir @universalmind303 (#5935)
- chore: consolidates arrow-* crates as workspace dependencies @rchowell (#5923)
- chore: bump
mypyandruffin pre-commit @aaron-ang (#5836) - chore: Add basic usage instructions for daft-dashboard in development guide @plotor (#5865)
Full Changelog: v0.7.1...v0.7.2
v0.7.1
What's Changed 🚀
✨ Features
- feat: Support pattern filtering for
SHOW TABLES@aaron-ang (#5423) - feat(docs): add copy page as markdown button @ykdojo (#5828)
- feat: support overwrite files by native IO @stayrascal (#5728)
- feat: Add support for Series[start:end] @everySympathy (#5815)
- feat: Implement retry-after mechanism for model apis and udfs @colin-ho (#5769)
🐛 Bug Fixes
- fix: using estimate memory bytes at first for display scan task source @stayrascal (#5845)
- fix: Remove pop_all assertion @colin-ho (#5850)
- fix: Check if deletion vector propagation is supported in deltalake @cckellogg (#5829)
- fix: Set default ImageMode in decode_image to RGB @colin-ho (#5827)
- fix: handle FileNotFoundError in read_huggingface fallback @ykdojo (#5831)
- fix: Add overflow protection to memory estimation @yudduy (#5417)
♻️ Refactor
- refactor(arrow2): get field and dtype roundtrips working @universalmind303 (#5849)
📖 Documentation
- docs: improve docstrings of IO read methods for remote URLs @aaron-ang (#5841)
- docs: fix broken links causing CI failure @ykdojo (#5832)
👷 CI
- ci: enable Windows Rust tests on PRs @ykdojo (#5823)
- ci: skip quickstart notebook in notebook-checker workflow @ykdojo (#5804)
🔧 Maintenance
- chore: remove overwrite_files & write_empty_tabular method @stayrascal (#5838)
- chore: Fix a minor ambiguity in the README docs @plotor (#5830)
Full Changelog: v0.7.0...v0.7.1
v0.7.0
What's Changed 🚀
💥 Breaking Changes
- chore!: remove spark connect @universalmind303 (#5743)
✨ Features
- feat(lance): row-level schema evolution support @Jay-ju (#5749)
- feat: Improve model api typing @colin-ho (#5809)
- feat(deltalake): allow users to ignore deletion vectors on read @kevinzwang (#5758)
- feat: Capture UDF argument names and values in structured logging. @rohitkulshreshtha (#5771)
- feat: support split and merge jsonl/ndjson files @caican00 (#5695)
- feat(tools): add markdown-to-notebook converter for documentation @ykdojo (#5691)
- feat: support label_selector specification in ray actor/task creation @Jay-ju (#5042)
- feat: support native csv writer @stayrascal (#5706)
- feat: Update optional dependencies for daft[postgres] @desmondcheongzx (#5586)
- feat(lance): add distributed compaction for Lance @huleilei (#5699)
- feat: Use JSON Serialization for Plans in Subscribers @srilman (#5709)
- feat: extended hash function to take and hash multiple inputs @rahulkodali (#5692)
- feat: Add Apache Gravitino catalog in catalog module @shaofengshi (#5694)
- feat: Better errors when lazy imports fail @samstokes (#5753)
- feat: Added structured logging for UDF errors. @rohitkulshreshtha (#5688)
- feat: Add statistics info to ScanTask when reading Lance dataset @plotor (#5727)
- feat: dynamic batching per operator @universalmind303 (#5676)
- feat: Allow users to disable the suffix range request @TheR1sing3un (#5188)
- feat: Streaming sample by size @colin-ho (#5663)
- feat: Allow dashboard to show query canceled/failed/dead information when query exited abnormally @VOID001 (#5576)
- feat: Add Google AI provider with prompt @everettVT (#5640)
- feat: Add pow expression @kliwongan (#5237)
- feat: No truncate in
.collectpreview @colin-ho (#5632) - feat: sample api supports precise sampling by size params @caican00 (#5600)
- feat: audio file subtype @universalmind303 (#5602)
- feat(tos): enhance the retry logic to aware response @stayrascal (#5569)
- feat: emit selectivity metric to OTel in swordfish filter op @samstokes (#5584)
- feat: Adding otel logger collector for collecting UDF errors. @rohitkulshreshtha (#5624)
🐛 Bug Fixes
- fix: handle Windows paths and query params in local_path_from_uri @ykdojo (#5819)
- fix: use pytest.importorskip for lance in test_limit_offset @ykdojo (#5818)
- fix: support skip empty json/jsonl files @caican00 (#5660)
- fix: minor doc fix @yuchaoran2011 (#5814)
- fix: CountRows with Limit returns unexpected result when reading Lance dataset @plotor (#5550)
- fix: Check for missing dependencies in OpenAI provider @everettVT (#5747)
- fix: fix btree index invalid issue when reading lance for point lookup @caican00 (#5673)
- fix: Combine deltalake with unity extra @everettVT (#5785)
- fix: enhance unit tests @caican00 (#5787)
- fix: patch CVE-2025-66478 update next dependencies to 16.0.7 @everettVT (#5786)
- fix: use single consolidated progress bar in Jupyter notebooks @ykdojo (#5774)
- fix: CuPy → NumPy needs explicit conversion @Jay-ju (#5680)
- fix: Fix Pydantic cloudpickle serialization in Google Colab @ykdojo (#5705)
- fix: update AI integration tests for new Subscriber interface @ykdojo (#5763)
- fix(optimizer): Prevent limits from being pushed below explodes in non-top-level projections @desmondcheongzx (#5292)
- fix(io): load all splits in read_huggingface fallback path @ykdojo (#5757)
- fix(test): use read_huggingface instead of read_parquet for HF test @ykdojo (#5755)
- fix: add disk cleanup to nightly integration-test-io job @ykdojo (#5711)
- fix: Postgres overwrite table should enable RLS and set up pgvector automatically @desmondcheongzx (#5657)
- fix: make it easier to enable different logging levels @Abyss-lord (#5661)
- fix: Dashboard logo animation. @j3nkii (#5672)
- fix(ci): add disk cleanup to integration-test-ai job @ykdojo (#5733)
- fix: update hypothesis test to use new expression API @ykdojo (#5723)
- fix: Fix type annotation check on Python 3.14 @srilman (#5721)
- fix: add fallback mechanism for HuggingFace datasets without parquet files @ykdojo (#5650)
- fix: Import or skip lance @colin-ho (#5662)
- fix: Add missing trailing slashes to S3-compatible endpoint urls @desmondcheongzx (#5575)
- fix: Add outer try-finally block in executor generator @colin-ho (#5633)
- fix: test_explain @universalmind303 (#5656)
- fix: Unify the naming and type of URI parameter for Lance-related APIs @plotor (#5634)
- fix: Fix blocked and oom issues for scan lance @caican00 (#5592)
- fix: Executing
explainwill panic when ScanTask is empty @plotor (#5582) - fix: Embed text dropping texts @colin-ho (#5641)
- fix: limit(n) return n rows directly @caican00 (#5597)
- fix: Upgrade to deltalake 1.2.1 @colin-ho (#5580)
- fix: add disk cleanup to integration-test-io-credentialed job @ykdojo (#5610)
- fix: add disk cleanup to doctests job @ykdojo (#5609)
- fix: Hashable identifier @colin-ho (#5598)
🚀 Performance
- perf: Lazy udf worker @colin-ho (#5542)
- perf: Use growable for build side @colin-ho (#5613)
- perf: optimize setting lance schema @Jay-ju (#5704)
♻️ Refactor
- refactor(arrow2): values_iter removals for primitive array @universalmind303 (#5802)
- refactor(arrow2): remove arrow2 from daft-functions-binary @universalmind303 (#5799)
- refactor(arrow2): remove deprecated usages from daft-functions-utf8 @universalmind303 (#5800)
- refactor(arrow2): remove deprecated methods from daft-functions-uri crate @universalmind303 (#5798)
- refactor(arrow2): remove arrow2 from
daft-functions-tokenize@universalmind303 (#5797) - refactor(arrow2): rename and deprecate
to_arrowandas_arrowfunctions @universalmind303 (#5796) - refactor: write empty dataframe to parquet/json files via native IO @stayrascal (#5682)
- refactor(arrow-rs): Move temporal conversions from arrow2 to arrow-rs @srilman (#5782)
- refactor(arrow-rs): Remove arrow2 Index generics usages @srilman (#5761)
- refactor(arrow2): rename and deprecate .to_arrow methods @universalmind303 (#5789)
- refactor(arrow): use arrow for ffi instead of arrow2 @universalmind303 (#5775)
- refactor(arrow): remove daft-arrow from daft-sql & daft-context crates @universalmind303 (#5773)
- refactor(arrow-rs): Move all validity
daft_arrow::bitmap::Bitmaps todaft_arrow::buffer::NullBuffer@srilman (#5750) - refactor: abstract MultipartWriter to write data to object store @stayrascal (#5702)
- refactor: Remove Unloaded MicroPartitions @srilman (#5710)
- refactor(arrow-rs): Add
daft-arrowmiddleman crate for Rust & Arrow usage @srilman (#5730)
📖 Documentation
- docs: Update slack invite @everettVT (#5813)
- docs: add logging settings @Jay-ju (#5671)
- docs: fix broken Bodo benchmark link @ykdojo (#5762)
- docs: add voice-analytics-example and update index @everettVT (#5737)
- docs: fix broken Lance documentation link @ykdojo (#5724)
- docs: remove redundant About Daft section from README @ykdojo (#5689)
- docs: remove redundant Table of Contents from README @ykdojo (#5684)
- docs: add Daft Cloud mentions to distributed execution docs @ykdojo (#5686)
- docs: fix quickstart connector links formatting @ykdojo (#5687)
- docs: update README to reflect AI/multimodal positioning @ykdojo (#5677)
- docs: Improve mkdocstrings template for Python examples rendering @ykdojo (#5642)
- docs: changed dev url to a live link to prevent 404 @j3nkii (#5669)
- docs: add Python version requirement to README @ykdojo (#5655)
- docs: update index overview page @ykdojo (#5627)
- docs: remove Python tabs from quickstart @ykdojo (#5626)
- docs: update contributor policy, add contributing section, remove old… @madvart (#5251)
- docs: add tip to find your dylib @universalmind303 (#5625)
- docs: add data persistence section to quickstart @ykdojo (#5607)
- docs: revamp quickstart with Amazon product dataset example @ykdojo (#5585)
✅ Tests
- test: fix flaky OpenAI test by using pattern constraint for hex color format @ykdojo (#5808)
- test(io): remove flaky test_read_huggingface_http_urls test @ykdojo (#5795)
👷 CI
- ci: increase unit-test timeout to 75 minutes for macOS @ykdojo (#5731)
- ci: exclude Kaggle from link checker @ykdojo (#5725)
🔧 Maintenance
- chore(deps): bump the minor group across 1 directory with 45 updates @dependabot[bot] (#5734)
- chore: Provide query end state to
RuntimeStatsManageron query end @colin-ho (#5791) - chore: update uvlock to remove tensorflow @kevinzwang (#5780)
- chore: Codeowners @colin-ho (#5502)
- chore: Don't install pytorch in iceberg test docker compose @colin-ho (#5781)
- chore: remove arrow dep from common-image @universalmind303 (#5772)
- chore: Pin dependencies @colin-ho (#5667)
- chore!: remove spark connect @universalmind303 (#5743)
- chore: remove ir and proto crates @universalmind303 (#5742)
- chore(deps): bump actions/checkout from 5 to 6 in the all group @dependabot[bot] (#5713)
- chore(deps): bump ctor from 0.5.0 to 0.6.1 @dependabot[bot] (#5717)
- chore: Cleanup additional Ray runner artifacts @srilman (#5714)
- chore: Remove the old Ray Runner @srilman (#5375)
- chore: Remove expression namespaces @colin-ho (#5619)
- chore: Remove runner from context @colin-ho (#5628)
- chore: add deprecation to daft.udf @kevinzwang (#5665)
- chore(deps): bump the all group with 13 updates @dependabot[bot] (#5480)
- chore: update bug report template @universalmind303 (#5652)
- chore: remove checklist from pr template @universalmind303 (#5653)
- chore: Remove deprecated agg methods and series split @colin-ho (#5630)
- chore: remove broken docpublish job from build-docs workflow @ykdojo (#5631)
- chore: Hint users to use
.collectwhen printing empty data...
v0.6.14
What's Changed 🚀
✨ Features
- feat: embed text metrics @colin-ho (#5583)
- feat: Add description and attributes to custom udf metrics @colin-ho (#5574)
- feat(flotilla): Aggregate Completed Worker Metrics in StatsManager @srilman (#5531)
- feat: add amplification metric for explode operator in native runner @samstokes (#5565)
🐛 Bug Fixes
- fix: Fix empty dataframe
showissue @caican00 (#5595) - fix: Fix openai test metrics fixture @colin-ho (#5593)
- fix: resolve docgen disk space failures by removing unused tools @ykdojo (#5589)
- fix: fix imports in explode.rs @colin-ho (#5573)
- fix: Add support for parsing STRUCT with parentheses syntax @Lucas61000 (#5449)
- fix: Dashboard Verbose Tracing Error @srilman (#5567)
- fix: Skip prompt metrics tests on ray runner @colin-ho (#5564)
📖 Documentation
- docs: Update AI functions usage patterns @everettVT (#5568)
🔧 Maintenance
Full Changelog: v0.6.13...v0.6.14
v0.6.13
What's Changed 🚀
💥 Breaking Changes
- refactor!: Remove support for creating File objects from bytes @universalmind303 (#5556)
✨ Features
- feat: Prompt metrics @colin-ho (#5549)
- feat: Async udf metrics @colin-ho (#5541)
- feat: Support specifying dimensions for text embedding @samstokes (#5543)
- feat: support customized retries error message of S3 request @stayrascal (#5447)
- feat: UDF metrics @colin-ho (#5507)
- feat: Support product @luoyuxia (#5515)
- feat: OTEL Metrics from Swordfish @srilman (#5454)
- feat: add tos object source @stayrascal (#5372)
- feat: Bind the name of the running UDF to the UDFActor @plotor (#5514)
- feat: Support text documents in prompt @colin-ho (#5520)
🐛 Bug Fixes
- fix: sorting on a literal value and aggregation with order-by @kevinzwang (#5547)
- fix: Limit with Offset returns unexpected result when reading Lance dataset @plotor (#5540)
- fix: Add absolute diff threshold to embed_text integration test @colin-ho (#5527)
- fix: test_embed_text_with_none_values with the OpenAI provider fails @desmondcheongzx (#5534)
- fix: Check for numpy dependency in prompt @colin-ho (#5521)
- fix: Handle Nones when embedding text with openai @desmondcheongzx (#5513)
- fix: Fix broken benchmark blog link @colin-ho (#5522)
♻️ Refactor
- refactor!: Remove support for creating File objects from bytes @universalmind303 (#5556)
📖 Documentation
- docs: fix migration guide @kevinzwang (#5563)
- docs: standardize key features casing to sentence case @ykdojo (#5559)
- docs: legacy UDF migration guide @kevinzwang (#5562)
- docs: simplify getting started tip in introduction @ykdojo (#5560)
- docs: update blog icon from bookmark to blog @ykdojo (#5557)
🔧 Maintenance
Full Changelog: v0.6.12...v0.6.13
v0.6.12
What's Changed 🚀
✨ Features
🐛 Bug Fixes
🚀 Performance
📖 Documentation
👷 CI
Full Changelog: v0.6.11...v0.6.12
v0.6.11
What's Changed 🚀
✨ Features
- feat: PostgresCatalog and PostgresTable followups @desmondcheongzx (#5508)
- feat: Add Catalog and Table implementations for PostgreSQL @desmondcheongzx (#5487)
- feat: make maintain_order configurable @stayrascal (#5505)
- feat: chat completions api for prompt function @colin-ho (#5497)
🐛 Bug Fixes
Full Changelog: v0.6.10...v0.6.11
v0.6.10
What's Changed 🚀
✨ Features
- feat: add --addr flag to daft-dashboard cli @VOID001 (#5444)
- feat: Support multiple image and file inputs for prompt function @colin-ho (#5481)
🐛 Bug Fixes
- fix: removes checking model directly for embedding dimensions @rchowell (#5445)
- fix: return-dtype for embed_text/image @universalmind303 (#5496)
- fix: Lower json inflation factor @colin-ho (#5461)
📖 Documentation
- docs: adds daft.func and daft.cls usage with migration page @everettVT (#5475)
🔧 Maintenance
- chore: Drop Python 3.9 @srilman (#5479)
- chore: remove extra from build command @stayrascal (#5493)
Full Changelog: v0.6.9...v0.6.10