You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Clarify the multiple incompatible uses of "phred-scale".
Sometimes this refers to $10 log_{10}(p)$, sometimes to $10
log_{10}(1-p)$, and sometimes to something normalised so $p$ isn't
really a probability at all.
Note CNL, CNP and CNQ don't mention phred anywhere in their
short description and only Phred in the long description for CNQ, so
I applied the same logic to PL, PP (is this correct?) and PQ.
Also clarified the "VCF tag naming conventions" part. I changed
phred-scale in one part there to phred-true-scale. I'm not so happy
with that, but as it's immediately followed by the formula I think
it's clear.
\item HQ (Integer): Haplotype qualities, two comma separated phred qualities.
517
517
\item MQ (Integer): RMS mapping quality, similar to the version in the INFO field.
518
-
\item PL (Integer): The phred-scaled genotype likelihoods rounded to the closest integer, and otherwise defined in the same way as the GL field.
519
-
\item PP (Integer): The phred-scaled genotype posterior probabilities rounded to the closest integer, and otherwise defined in the same way as the GP field.
518
+
\item PL (Integer): The $log_{10}$scaled genotype likelihoods rounded to the closest integer, and otherwise defined in the same way as the GL field.
519
+
\item PP (Integer): The $log_{10}$scaled genotype posterior probabilities rounded to the closest integer, and otherwise defined in the same way as the GP field.
520
520
\item PQ (Integer): Phasing quality, the phred-scaled probability that alleles are ordered incorrectly in a heterozygote (against all other members in the phase set).
521
521
We note that we have not yet included the specific measure for precisely defining ``phasing quality''; our intention for now is simply to reserve the PQ tag for future use as a measure of phasing quality.
522
522
\item PS (non-negative 32-bit Integer): Phase set, defined as a set of phased genotypes to which this genotype belongs.
@@ -544,13 +544,14 @@ \subsection{VCF tag naming conventions}
544
544
\begin{itemize}
545
545
\item The `L' suffix means \emph{likelihood} as log-likelihood in the sampling distribution, $\log_{10} \Pr(\mathrm{Data}|\mathrm{Model})$.
546
546
Likelihoods are represented as $\log_{10}$ scale, thus they are negative numbers (e.g.\ GL, CNL).
547
-
The likelihood can be also represented in some cases as phred-scale in a separate tag (e.g.\ PL).
547
+
The likelihood can be also represented in some cases as a phred-true scale ($-10\log_{10}(probability\_of\_being\_correct)$) in a separate tag (e.g.\ PL).
548
+
In this case they may be normalised so the most likely event has a score of 0.
548
549
549
550
\item The `P' suffix means \emph{probability} as linear-scale probability in the posterior distribution, which is $\Pr(\mathrm{Model}|\mathrm{Data})$. Examples are GP, CNP.
550
551
551
552
\item The `Q' suffix means \emph{quality} as log-complementary-phred-scale posterior probability, $-10\log_{10} \Pr(\mathrm{Data}|\mathrm{Model})$, where the model is the most likely genotype that appears in the GT field.
552
553
Examples are GQ, CNQ.
553
-
The fixed site-level QUAL field follows the same convention (represented as a phred-scaled number).
554
+
The fixed site-level QUAL field follows the same convention (represented as a phred-scaled number with $QUAL = -10\log_{10}(probability\_of\_being\_incorrect)$).
554
555
\end{itemize}
555
556
556
557
@@ -2085,6 +2086,7 @@ \section{List of changes}
2085
2086
\subsection{Changes to VCFv4.3}
2086
2087
2087
2088
\begin{itemize}
2089
+
\item Clarify distinction between Phred ($-10 log_{10}(p\_of\_incorrect)$) and $-10 log_{10}(p\_of\_correct)$.
2088
2090
\item More strict language: ``should'' replaced with ``must'' where appropriate
2089
2091
\item Tables with Type and Number definitions for INFO and FORMAT reserved keys
0 commit comments