Skip to content

Information about Supplementary_Table2_datasetsQC.xlsx  #34

@smanfri

Description

@smanfri

Good morning,

I'm a student in Computer Science at Università degli Studi di Milano and for my thesis I am assessing some pipeline for the analysis of SARS-CoV-2 samples.
In order to select the best pipeline for our requirements, I'm using the benchmark datasets available here.
I found the Supplementary_table2 in your paper (Xiaoli L, Hagey JV, Park DJ, Gulvik CA, Young EL, Alikhan N-F, Lawsin A, Hassell N, Knipe K, Oakeson KF, Retchless AC, Shakya M, Lo C-C, Chain P, Page AJ, Metcalf BJ, Su M, Rowell J, Vidyaprakash E, Paden CR, Huang AD, Roellig D, Patel K, Winglee K, Weigand MR, Katz LS. 2022. Benchmark datasets for SARS-CoV-2 surveillance bioinformatics. PeerJ 10:e13821 http://doi.org/10.7717/peerj.13821) and I would like to use also the data contained there for evaluations (and not only the file in.tsv available for every dataset).
I'm writing here because I can't understand how the column 'Total reads' is calculated. In particular, I used FastQC (the value of the field 'Total Sequences') to compute this value and I also counted the reads in the original .FASTQ file but the numbers don't correspond to the ones published in the Supplementary_table2.

Do you know why the numbers are different? Is it possible that Supplementary_table2 is outdated with respect to the current version of the dataset?
If this is the case, which version of the dataset is matched to Supplementary_table2 and used in your paper?

Thank you very much for your time :)

Best regards,
Sara Manfredi

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions