Dataset Inspector is a desktop UI for inspecting local Lightning-AI/litData shards, MosaicML Streaming (MDS) shards, and WebDataset tar shards, with support for previewing Hugging Face and Zenodo datasets directly over HTTP without full downloads. Supported platforms: Windows, macOS, and Linux (web planned).
dataset-inspector(this repo): dataset loading/inspection UI and core data preview workflow.
- Auto-detect local LitData indexes/chunks, MosaicML MDS, and WebDataset shards.
- Enhanced Hugging Face support: Direct Parquet streaming via DuckDB — preview datasets that huggingface.co cannot display.
- Preview Zenodo records and browse ZIP/TAR entries with HTTP range requests.
- Rich previews for JSON/text, images, audio, and video.
- Open fields with the system default app.
Local LitData shards |
Local WebDataset tar shards |
Hugging Face dataset preview |
Zenodo record preview |
- Download Dataset Inspector installers from Releases.
- Paste a local dataset path or Hugging Face/Zenodo URL, then press Load.
- Local shards: pick a shard/chunk -> item/sample -> field, then preview fields.
- Hugging Face: pick a config/split -> row -> field (add a token if needed).
- Zenodo: pick a record -> file -> entry (ZIP/TAR), then preview/open files.
- Report issues/feature requests: https://github.com/binbinsh/dataset-inspector/issues
- LitData: docs/litdata.md
- MosaicML MDS: docs/mosaicml.md
- WebDataset: docs/webdataset.md
- Hugging Face: docs/huggingface.md
- Zenodo: docs/zenodo.md
- Audio preview: docs/audio.md
- Development: docs/development.md



