Skip to content

Consider columnar storage #38

@adsharma

Description

@adsharma

This is about the huggingface datasets.

Many of them are either compressed csv/json dumps which are not viewable/queryable using the huggingface UI. Have you considered using parquet/duckdb file formats?

I have some scripts to process llama3*.zip files to produce parquet/duckdb. They produce a entity -> event -> event graph. Not sure about concepts graph.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions