This is about the huggingface datasets.
Many of them are either compressed csv/json dumps which are not viewable/queryable using the huggingface UI. Have you considered using parquet/duckdb file formats?
I have some scripts to process llama3*.zip files to produce parquet/duckdb. They produce a entity -> event -> event graph. Not sure about concepts graph.
This is about the huggingface datasets.
Many of them are either compressed csv/json dumps which are not viewable/queryable using the huggingface UI. Have you considered using parquet/duckdb file formats?
I have some scripts to process llama3*.zip files to produce parquet/duckdb. They produce a entity -> event -> event graph. Not sure about concepts graph.