Questions about multi-batch long reads processing, VCF FILTER field interpretation and 1000 Genomes BAM usage in TRGT

Dear TRGT Development Team,
We are writing to consult three questions regarding genotyping analysis with long reads data:
1. For long reads data of a specific sample from projects like the 1000 Genomes Project (see the attached figure), does such data typically correspond to multiple sequencing batches? When dealing with multi-batch fastq data for genotyping using TRGT, what is the standard workflow? Should we merge all batches into a single fastq file, perform alignment first, and then conduct genotyping, or is it acceptable to directly perform genotyping with data from only one batch?

<img width="1122" height="704" alt="Image" src="https://github.com/user-attachments/assets/00e50d8c-aa48-485b-9cde-ea5e6cddb4c7" />

2. Regarding the filter field in the VCF file generated by TRGT: ```##FILTER=<ID=PASS,Description="All filters passed">```. Does this field indicate that the corresponding genotyping results have passed all quality filter criteria and can be directly used for subsequent analyses? We would also like to supplement our practical confusion: in the VCF files we generated, the value of the FILTER field for all genotyped variants is . instead of PASS. We wonder if this situation means that the data quality of these genotyped variants is substandard and the results are unreliable?

<img width="2460" height="198" alt="Image" src="https://github.com/user-attachments/assets/1df75764-eba1-4d7f-8684-c36bad44893b" />

3. We also have an additional question: In relevant research papers, when analyzing data from the 1000 Genomes Project, we noticed a phenomenon—for short reads data, researchers generally directly use the BAM alignment files provided on the project's official website; however, for long reads data, many researchers choose to abandon the existing official files and instead re-perform sequence alignment to generate new BAM files. If your team has any insights into this, could you please share the reasons behind it?

<img width="1440" height="1170" alt="Image" src="https://github.com/user-attachments/assets/63e3fd86-ea24-4b11-a5d3-45d0fcf7a7f6" />

Thank you for your time and help!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about multi-batch long reads processing, VCF FILTER field interpretation and 1000 Genomes BAM usage in TRGT #93

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Questions about multi-batch long reads processing, VCF FILTER field interpretation and 1000 Genomes BAM usage in TRGT #93

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions