Skip to content

Commit 820dc80

Browse files
Documented allele values in VCF.
Closes #298
1 parent e1a3808 commit 820dc80

File tree

1 file changed

+10
-1
lines changed

1 file changed

+10
-1
lines changed

python/tskit/trees.py

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3048,7 +3048,16 @@ def write_vcf(
30483048
Each individual in the output is identified by a string; these are the
30493049
VCF "sample" names. By default, these are of the form ``tsk_0``,
30503050
``tsk_1`` etc, up to the number of individuals, but can be manually
3051-
specified using the ``individual_names`` argument.
3051+
specified using the ``individual_names`` argument. We do not check
3052+
for duplicates in this array, or perform any checks to ensure that
3053+
the output VCF is well-formed.
3054+
3055+
The REF value in the output VCF is the ancestral allele for a site
3056+
and ALT values are the remaining alleles. It is important to note,
3057+
therefore, that for real data this means that the REF value for a given
3058+
site **may not** be equal to the reference allele. We also do not
3059+
check that the alleles result in a valid VCF---for example, it is possible
3060+
to use the tab character as an allele, leading to a broken VCF.
30523061
30533062
The ``position_transform`` argument provides a way to flexibly translate
30543063
the genomic location of sites in tskit to the appropriate value in VCF.

0 commit comments

Comments
 (0)