Data Cockpit

Data Cockpit is an interactive IPython widget built on top of the Dataplug framework. It enables scientists and engineers to:

Upload and browse datasets in Amazon S3
Explore curated public and Metaspace collections
Benchmark performance to discover optimal batch sizes
Partition a variety of scientific data types into chunks or batches
Integrate seamlessly into Jupyter notebooks for elastic, parallel workloads

Why Data Cockpit?

Built on Dataplug’s Cloud-Aware Partitioning

Dataplug is a client-side Python framework for dynamic, zero-cost data slicing of unstructured scientific data stored in object stores like S3. It:

Pre-processes data in a read-only fashion, building lightweight indexes decoupled from the raw objects
Exploits S3 byte-range reads to parallelize high-bandwidth access across many workers
Supports a plug-in interface for multiple domains:
- Generic: CSV, raw text
- Genomics: FASTA, FASTQ, VCF
- Geospatial: LiDAR, Cloud-Optimized Point Cloud (COPC), COG
- Metabolomics: ImzML
Allows re-partitioning with different strategies without rewriting the original data

What Data Cockpit Adds

While Dataplug focuses on efficient data slicing, Data Cockpit provides an end-to-end Jupyter UI that:

Uploads your local files directly into any S3 bucket
Browses existing buckets or public datasets from the AWS Open Data Registry
Runs benchmarks across a configurable range of batch sizes to find the fastest throughput
Processes & partitions your data with one click, displaying progress and results entirely in-notebook
Retrieves partitions via get_data_slices(), which returns the DataPlug data slices (metadata) for downstream processing

Installation

pip install cloud-data-cockpit

Or install both Data Cockpit and geospatial extras together:

pip install cloud-data-cockpit[geospatial]

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
cloud_data_cockpit.egg-info		cloud_data_cockpit.egg-info
cloud_data_cockpit		cloud_data_cockpit
dist		dist
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Cockpit

Why Data Cockpit?

Built on Dataplug’s Cloud-Aware Partitioning

What Data Cockpit Adds

Installation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Data Cockpit

Why Data Cockpit?

Built on Dataplug’s Cloud-Aware Partitioning

What Data Cockpit Adds

Installation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages