Skip to content

Understand PMC_128 data #4

@james20141606

Description

@james20141606

Hi, I downloaded the chunked PMC dataset with link http://nlp.dmis.korea.edu/projects/selfbiorag-jeong-et-al-2024/data/retriever/PMC_128.tar.gz
I found that there are files

PMC_128_Abs_Articles.json  PMC_128_Main_Articles.json
PMC_128_Abs_Embeds.npy     PMC_128_Main_Embeds.npy
PMC_128_idx_array.npy

I assume that contain's everything?

For the other small files under PMC_128_temporary, they should be the same as the above merged file?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions