Skip to content

Runtime for large datasets. #84

@sdsilva10

Description

@sdsilva10

Hi,

I am trying to generate some cohort metrics for QC steps via peddy. My sample size is about 187000. I have provided the gz zipped VCF and fam (PLINK format) file for these samples as input. On running the command for the QC plots, all sample id are listed and a terminal output "ped_check" appears. However, there is no progress beyond this stage, and the process continues to run beyond 24 hrs mark.

I have executed this run on a HPC node:
Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz
RAM: 180 Gb.

Is there a limitation on the input sample size?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions