-
Notifications
You must be signed in to change notification settings - Fork 14
Open
Description
Dear TRGT Development Team,
We are writing to consult three questions regarding genotyping analysis with long reads data:
- For long reads data of a specific sample from projects like the 1000 Genomes Project (see the attached figure), does such data typically correspond to multiple sequencing batches? When dealing with multi-batch fastq data for genotyping using TRGT, what is the standard workflow? Should we merge all batches into a single fastq file, perform alignment first, and then conduct genotyping, or is it acceptable to directly perform genotyping with data from only one batch?
- Regarding the filter field in the VCF file generated by TRGT:
##FILTER=<ID=PASS,Description="All filters passed">. Does this field indicate that the corresponding genotyping results have passed all quality filter criteria and can be directly used for subsequent analyses? We would also like to supplement our practical confusion: in the VCF files we generated, the value of the FILTER field for all genotyped variants is . instead of PASS. We wonder if this situation means that the data quality of these genotyped variants is substandard and the results are unreliable?
- We also have an additional question: In relevant research papers, when analyzing data from the 1000 Genomes Project, we noticed a phenomenon—for short reads data, researchers generally directly use the BAM alignment files provided on the project's official website; however, for long reads data, many researchers choose to abandon the existing official files and instead re-perform sequence alignment to generate new BAM files. If your team has any insights into this, could you please share the reasons behind it?
Thank you for your time and help!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels