Skip to content

binbinsh/dataset-inspector

Repository files navigation

Dataset Inspector Icon

Dataset Inspector

Dataset Inspector is a desktop UI for inspecting local Lightning-AI/litData shards, MosaicML Streaming (MDS) shards, and WebDataset tar shards, with support for previewing Hugging Face and Zenodo datasets directly over HTTP without full downloads. Supported platforms: Windows, macOS, and Linux (web planned).

Scope

  • dataset-inspector (this repo): dataset loading/inspection UI and core data preview workflow.

Features

  • Auto-detect local LitData indexes/chunks, MosaicML MDS, and WebDataset shards.
  • Enhanced Hugging Face support: Direct Parquet streaming via DuckDB — preview datasets that huggingface.co cannot display.
  • Preview Zenodo records and browse ZIP/TAR entries with HTTP range requests.
  • Rich previews for JSON/text, images, audio, and video.
  • Open fields with the system default app.

Local LitData shards

Local WebDataset tar shards

Hugging Face dataset preview

Zenodo record preview

Usage

  1. Download Dataset Inspector installers from Releases.
  2. Paste a local dataset path or Hugging Face/Zenodo URL, then press Load.
  3. Local shards: pick a shard/chunk -> item/sample -> field, then preview fields.
  4. Hugging Face: pick a config/split -> row -> field (add a token if needed).
  5. Zenodo: pick a record -> file -> entry (ZIP/TAR), then preview/open files.
  6. Report issues/feature requests: https://github.com/binbinsh/dataset-inspector/issues

Docs

About

A desktop UI for inspecting litData shards, MDS shards, WebDataset shards, Hugging Face online datasets and Zenodo datasets

Resources

License

Stars

Watchers

Forks

Contributors