Skip to content

Releases: huggingface/pyspark_huggingface

2.0.0

12 Aug 17:23
dc44501

Choose a tag to compare

New

  • Enable content defined chunking by @lhoestq in #13
    • this enables fast deduped uploads

What's Changed

New Contributors

Full Changelog: 1.0.0...2.0.0

1.0.0

29 May 14:04

Choose a tag to compare

What's Changed

Full Changelog: 0.1.1...1.0.0

0.1.1

14 Mar 18:02

Choose a tag to compare

What's Changed

  • authentication to support reading gated/private datasets on HF
  • support reading specific git revisions
  • improved error messages
  • Reduce requirements by deferring import until when it's actually needed by @wengh in #8
  • Fix import for compatibility with older huggingface_hub by @wengh in #9

Full Changelog: 0.1.0...0.1.1

0.1.0

14 Mar 18:01

Choose a tag to compare

What's Changed

  • Add basic HuggingFace Data Source Implementation by @allisonwang-db in #1
  • initial pyproject.toml by @lhoestq in #2
  • Add more features to huggingface reader by @allisonwang-db in #3
  • Enable predicate pushdown by @lhoestq in #4
  • Add HuggingFaceSink data source by @wengh in #5
  • Support custom split name by renaming files by @wengh in #6
  • Use same data source name for reader and writer by @wengh in #7

New Contributors

Full Changelog: https://github.com/huggingface/pyspark_huggingface/commits/0.1.0