Skip to content

Bump datasets from 4.5.0 to 4.6.0#3952

Open
dependabot[bot] wants to merge 1 commit intodevelopfrom
dependabot/pip/datasets-4.6.0
Open

Bump datasets from 4.5.0 to 4.6.0#3952
dependabot[bot] wants to merge 1 commit intodevelopfrom
dependabot/pip/datasets-4.6.0

Conversation

@dependabot
Copy link
Contributor

@dependabot dependabot bot commented on behalf of github Feb 26, 2026

Bumps datasets from 4.5.0 to 4.6.0.

Release notes

Sourced from datasets's releases.

4.6.0

Dataset Features

  • Support Image, Video and Audio types in Lance datasets

    >>> from datasets import load_dataset
    >>> ds = load_dataset("lance-format/Openvid-1M", streaming=True, split="train")
    >>> ds.features
    {'video_blob': Video(),
     'video_path': Value('string'),
     'caption': Value('string'),
     'aesthetic_score': Value('float64'),
     'motion_score': Value('float64'),
     'temporal_consistency_score': Value('float64'),
     'camera_motion': Value('string'),
     'frame': Value('int64'),
     'fps': Value('float64'),
     'seconds': Value('float64'),
     'embedding': List(Value('float32'), length=1024)}
  • Push to hub now supports Video types

     >>> from datasets import Dataset, Video
    >>> ds = Dataset.from_dict({"video": ["path/to/video.mp4"]})
    >>> ds = ds.cast_column("video", Video())
    >>> ds.push_to_hub("username/my-video-dataset")
  • Write image/audio/video blobs as is in parquet (PLAIN) in push_to_hub() by @​lhoestq in huggingface/datasets#7976

    • this enables cross-format Xet deduplication for image/audio/video, e.g. deduplicate videos between Lance, WebDataset, Parquet files and plain video files and make downloads and uploads faster to Hugging Face
    • E.g. if you convert a Lance video dataset to a Parquet video dataset on Hugging Face, the upload will be much faster since videos don't need to be reuploaded. Under the hood, the Xet storage reuses the binary chunks from the videos in Lance format for the videos in Parquet format
    • See more info here: https://huggingface.co/docs/hub/en/xet/deduplication
  • Add IterableDataset.reshard() by @​lhoestq in huggingface/datasets#7992

    Reshard the dataset if possible, i.e. split the current shards further into more shards. This increases the number of shards and the resulting dataset has num_shards >= previous_num_shards. Equality may happen if no shard can be split further.

    The resharding mechanism depends on the dataset file format:

    • Parquet: shard per row group instead of per file

... (truncated)

Commits

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot dependabot bot added dependencies Any changes in any dependencies (new dep or its version) should be produced via Change Request on PM python Pull requests that update python code labels Feb 26, 2026
@dependabot dependabot bot requested a review from a team as a code owner February 26, 2026 05:03
@dependabot dependabot bot added dependencies Any changes in any dependencies (new dep or its version) should be produced via Change Request on PM python Pull requests that update python code labels Feb 26, 2026
@github-actions github-actions bot removed the dependencies Any changes in any dependencies (new dep or its version) should be produced via Change Request on PM label Feb 26, 2026
@AlexanderDokuchaev
Copy link
Collaborator

@copilot modify this branch, bump version of dataset package in all requirements.txt and constraints.txt files

Copy link
Contributor

Copilot AI commented Feb 26, 2026

@AlexanderDokuchaev I've opened a new pull request, #3955, to work on those changes. Once the pull request is ready, I'll request review from you.

Bumps [datasets](https://github.com/huggingface/datasets) from 4.5.0 to 4.6.0.
- [Release notes](https://github.com/huggingface/datasets/releases)
- [Commits](huggingface/datasets@4.5.0...4.6.0)

---
updated-dependencies:
- dependency-name: datasets
  dependency-version: 4.6.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot bot force-pushed the dependabot/pip/datasets-4.6.0 branch from 50f8a2d to c2ba504 Compare February 27, 2026 08:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python Pull requests that update python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants