Skip to content
This repository was archived by the owner on Dec 12, 2024. It is now read-only.

Commit d93747b

Browse files
Merge pull request #471 from jeromekelleher/final-submission-tweaks
Update "probabilistic sampling" bit
2 parents 0cb62aa + e1329f2 commit d93747b

File tree

5 files changed

+144
-40
lines changed

5 files changed

+144
-40
lines changed

Makefile

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ ILLUSTRATIONS=\
1515
illustrations/cell-lines.pdf \
1616
illustrations/simplification-with-edges.pdf \
1717

18-
all: paper.pdf response-to-reviewers.pdf
18+
all: paper.pdf response-to-reviewers.pdf response-to-reviewers-2.pdf
1919

2020
paper.pdf: paper.tex paper.bib ${DATA} ${FIGURES} ${ILLUSTRATIONS}
2121
pdflatex -shell-escape paper.tex
@@ -102,3 +102,15 @@ review-diff.pdf: review-diff.tex
102102

103103
response-to-reviewers.pdf: response-to-reviewers.tex
104104
pdflatex $<
105+
106+
review-diff-2.tex: paper.tex
107+
latexdiff reviewed-paper-2.tex paper.tex > review-diff-2.tex
108+
109+
review-diff-2.pdf: review-diff-2.tex
110+
pdflatex review-diff-2.tex
111+
pdflatex review-diff-2.tex
112+
bibtex review-diff-2
113+
pdflatex review-diff-2.tex
114+
115+
response-to-reviewers-2.pdf: response-to-reviewers-2.tex
116+
pdflatex $<

cover-letter/Makefile

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,10 @@
1-
all: cover-letter.pdf cover-letter-resubmit.pdf
1+
all: cover-letter.pdf cover-letter-resubmit.pdf cover-letter-resubmit-2.pdf
22

33
cover-letter.pdf: cover-letter.tex
44
pdflatex cover-letter.tex
55

66
cover-letter-resubmit.pdf: cover-letter-resubmit.tex
77
pdflatex cover-letter-resubmit.tex
8+
9+
cover-letter-resubmit-2.pdf: cover-letter-resubmit-2.tex
10+
pdflatex cover-letter-resubmit-2.tex
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
\documentclass{letter}
2+
3+
\signature{Jerome Kelleher}
4+
5+
\address{Big Data Institute\\University of Oxford}
6+
\begin{document}
7+
8+
\begin{letter}{GENETICS}
9+
10+
\opening{Dear Graham,}
11+
12+
I am writing on behalf of my coauthors to resubmit our
13+
manuscript entitled
14+
\emph{A general and efficient representation of ancestral recombination
15+
graphs}.
16+
17+
We are delighted that it is potentially suitable for publication in GENETICS,
18+
and have endeavored to address the points you have raised.
19+
20+
We have attached a detailed point-by-point response in the
21+
\texttt{response-to-reviewers-2.pdf} file, along with a
22+
latex-diff of the differences between the current and previous submissions.
23+
24+
Thank you again for your careful and helpful input throughout this process.
25+
26+
\closing{Sincerely,}
27+
28+
\end{letter}
29+
\end{document}

paper.tex

Lines changed: 10 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@
6868
% This rapid progress has led to a diversity of ARG definitions and representations.
6969
Classical formalisms have focused on mapping
7070
coalescence and recombination events to the nodes in an ARG.
71-
This approach is out of step with many modern developments, however,
71+
This approach is out of step with some modern developments, however,
7272
which do not represent genetic inheritance in terms of these events
7373
or explicitly infer them.
7474
We present a simple formalism that defines an ARG in terms
@@ -507,18 +507,17 @@ \section{Event ARGs}
507507
Aside from these practical challenges, there is also a deeper
508508
issue with the implicit strategy of basing an ARG data structure on
509509
recording events and their properties (e.g.\ the crossover breakpoint
510-
for a recombination event). This approach
510+
for a recombination event).
511+
This approach
511512
requires all events to be recorded explicitly, and does not
512-
provide an obvious mechanism for either aggregating multiple events
513-
or expressing uncertainty about them. This is not a
514-
problem when describing the results of simulations, where all details
515-
are perfectly known. However, it can be an issue when we wish to
516-
formally describe the output of various inference methods, particularly
517-
those that avoid inferring events that are not \emph{knowable} from the data:
518-
a useful approach as datasets approach the population scale~\citep[e.g.][]{
513+
provide an obvious mechanism for aggregating multiple, potentially
514+
unresolvable, events.
515+
As datasets approach the population scale~\citep[e.g.][]{
519516
turnbull2018hundred, bycroft2018genome,hayes20191000,
520517
Ros-Freixedes2020,karczewski2020mutational,tanjo2021practical,
521-
halldorsson2022sequences}.
518+
halldorsson2022sequences} representing such uncertainty
519+
directly through the data structure is a useful alternative to
520+
classical methods based on probabilistic sampling.
522521

523522
% There is also a certain clarity gained by explicitly modelling nodes
524523
% in the inheritance graph as genomes.
@@ -1129,7 +1128,7 @@ \section{Discussion}
11291128
The emerging ARG software ecosystem could similarly benefit
11301129
from the adoption of such shared community infrastructure
11311130
to handle the mundane and time-consuming details of data interchange.
1132-
The \texttt{tskit} library (Section~\ref{sec-efficiency})
1131+
The \texttt{tskit} library
11331132
is a high-quality open-source gARG implementation,
11341133
with proven efficiency and
11351134
scalability~\citep[e.g.][]{anderson2022genes,zhan2023towards},

response-to-reviewers-2.tex

Lines changed: 88 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -57,11 +57,23 @@ \section*{Response to the editor}
5757
\section*{Associate Editor's comments}
5858

5959
\begin{point}
60-
My remaining broad concern is that the paper is still in places somewhat narrow about the goals of future ARG development. I certainly see the practical utility of dropping inference down to some minimum "knowable" structure that can be reconstructed using deterministic algorithms for very large datasets. However, probabilistic reconstructions of some form of ARG with more explicit events is also a reasonable goal moving forwards (e.g. for some applications we may want a subset of the recombination events explicitly included). There are a few places where the paper still comes across as overly dogmatic about the minimum "knowable" ARG being the only goal (although the discussion casts a broader view).
60+
My remaining broad concern is that the paper is still in places somewhat narrow
61+
about the goals of future ARG development. I certainly see the practical
62+
utility of dropping inference down to some minimum ``knowable'' structure that
63+
can be reconstructed using deterministic algorithms for very large datasets.
64+
However, probabilistic reconstructions of some form of ARG with more explicit
65+
events is also a reasonable goal moving forwards (e.g. for some applications we
66+
may want a subset of the recombination events explicitly included). There are a
67+
few places where the paper still comes across as overly dogmatic about the
68+
minimum ``knowable'' ARG being the only goal (although the discussion casts a
69+
broader view).
6170
\end{point}
6271
\begin{reply}
63-
We have gone through the article and, in addition to the suggestions made below, have rephrased
64-
parts to make it clear that a gARG can be used to encode a \emph{variety} of ARG structures, whether events are or are not explicitly inferred by the reconstruction method. We specifically state at the end of \emph{A diversity of structures} that
72+
We have gone through the article and, in addition to the suggestions made
73+
below, have rephrased parts to make it clear that a gARG can be used to encode
74+
a \emph{variety} of ARG structures, whether events are or are not explicitly
75+
inferred by the reconstruction method. We specifically state at the end of
76+
\emph{A diversity of structures} that
6577
\begin{quote}
6678
A gARG can encode a diversity of ARG structures, including
6779
those where events \emph{are} recorded explicitly, and those where
@@ -70,94 +82,143 @@ \section*{Associate Editor's comments}
7082
\end{reply}
7183

7284
\begin{point}
73-
Abstract: "This approach is out of step with modern developments, which do not represent genetic inheritance in terms of these events or explicitly infer them." So this is on the places where I feel like the authors state things too strongly. The authors, and some others, approaches have taken this path, but folks can agree that the gARG is a good idea and yet think that explicitly inferring details of recombination events is a `modern' goal.
85+
Abstract: ``This approach is out of step with modern developments, which do not
86+
represent genetic inheritance in terms of these events or explicitly infer
87+
them.'' So this is on the places where I feel like the authors state things too
88+
strongly. The authors, and some others, approaches have taken this path, but
89+
folks can agree that the gARG is a good idea and yet think that explicitly
90+
inferring details of recombination events is a `modern' goal.
7491
\end{point}
7592
\begin{reply}
76-
We have changed this to "This approach is out of step with many modern developments, however,..."
93+
We have changed this to ``This approach is out of step with some modern developments,
94+
however,...''
7795
\end{reply}
7896

7997
\begin{point}
80-
"Broadly speaking, an ARG describes the different paths of genetic inheritance caused by recombination, encapsulating the resulting complex web of genetic ancestry " - add "of a set of samples". Also I'd say "genetic ancestors", as ancestry is tied up with genetic ancestry groups in peoples' minds.
98+
``Broadly speaking, an ARG describes the different paths of genetic inheritance
99+
caused by recombination, encapsulating the resulting complex web of genetic
100+
ancestry'' - add ``of a set of samples''. Also I'd say ``genetic ancestors'', as
101+
ancestry is tied up with genetic ancestry groups in peoples' minds.
81102
\end{point}
82103
\begin{reply}
83104
Amended as suggested.
84105
\end{reply}
85106

86107
\begin{point}
87-
"We define a genome as the complete set of genetic material that a child inherits from one parent. A diploid individual therefore carries two genomes, one inherited from each parent (we assume diploids here for clarity, but the definitions apply to organisms of arbitrary ploidy). " -Excludes Y, mtDNA, and X as written, please revise, e.g. talk about autosomal genome.
108+
``We define a genome as the complete set of genetic material that a child
109+
inherits from one parent. A diploid individual therefore carries two genomes,
110+
one inherited from each parent (we assume diploids here for clarity, but the
111+
definitions apply to organisms of arbitrary ploidy). '' -Excludes Y, mtDNA, and
112+
X as written, please revise, e.g. talk about autosomal genome.
88113
\end{point}
89114
\begin{reply}
90115
Amended as suggested.
91116
\end{reply}
92117

93118
\begin{point}
94-
"The topology of a gARG specifies that genetic inheritance occurred between particular ancestors and descendants, " -struggle slightly with word "particular" here as the identity of the ancestors is not known. Deleting "particular" is likely sufficient.
119+
``The topology of a gARG specifies that genetic inheritance occurred between
120+
particular ancestors and descendants, '' -struggle slightly with word
121+
``particular" here as the identity of the ancestors is not known. Deleting
122+
``particular" is likely sufficient.
95123
\end{point}
96124
\begin{reply}
97125
Amended as suggested.
98126
\end{reply}
99127

100128
\begin{point}
101-
"This is sufficient to describe the effects of inheritance under any form of homologous recombination (such as multiple crossovers,..." -do you mean multiple crossovers during a single round of meiosis.
129+
``This is sufficient to describe the effects of inheritance under any form of
130+
homologous recombination (such as multiple crossovers,..." -do you mean
131+
multiple crossovers during a single round of meiosis.
102132
\end{point}
103133
\begin{reply}
104134
Yes - amended to clarify this.
105135
\end{reply}
106136

107137
\begin{point}
108-
"In this encoding there are two types of internal node in the graph, representing the common ancestor and recombination events in the history of a sample. " stipulate that these are most recent common ancestor events.
138+
``In this encoding there are two types of internal node in the graph,
139+
representing the common ancestor and recombination events in the history of a
140+
sample. " stipulate that these are most recent common ancestor events.
109141
\end{point}
110142
\begin{reply}
111143
Amended as suggested.
112144
\end{reply}
113145

114146
\begin{point}
115-
"This approach assumes all events are knowable, and does not provide an obvious mechanism for either aggregating multiple events or expressing uncertainty about them. While this is not a problem when describing the results of simulations". -Maybe one way to flip this around would be to say that because it arose from tracking a particular stochastic process it has these properties. Also I don't think it assumes that all events are knowable, eg we could construct some parsimonious ARG or probabilistic ARG. If we wish to express uncertainty about events we usually give draws from the posterior etc. I agree that might be computational prohibitive with large samples etc, but it seems like place to take a broad view. This seems like a place to acknowledge that for some applications we might want to explicitly reconstruct the events.
147+
``This approach assumes all events are knowable, and does not provide an obvious
148+
mechanism for either aggregating multiple events or expressing uncertainty
149+
about them. While this is not a problem when describing the results of
150+
simulations''. -Maybe one way to flip this around would be to say that because
151+
it arose from tracking a particular stochastic process it has these properties.
152+
Also I don't think it assumes that all events are knowable, eg we could
153+
construct some parsimonious ARG or probabilistic ARG. If we wish to express
154+
uncertainty about events we usually give draws from the posterior etc. I agree
155+
that might be computational prohibitive with large samples etc, but it seems
156+
like place to take a broad view. This seems like a place to acknowledge that
157+
for some applications we might want to explicitly reconstruct the events.
116158
\end{point}
117159
\begin{reply}
118160
We have rephrased this part to read
119161
\begin{quote}
120-
This approach necessitates that all events are recorded explicitly, and does not
121-
provide an obvious mechanism for either aggregating multiple events
122-
or expressing uncertainty about them. While this is not a
123-
problem when describing the results of simulations, for instance (where all details
124-
are perfectly known), it is an issue when we wish to
125-
formally describe the output of inference methods which do not
126-
necessarily attempt to infer events that are not \emph{knowable} from the data,
127-
particularly as datasets approach the population scale...
128162
\end{quote}
163+
This approach
164+
requires all events to be recorded explicitly, and does not
165+
provide an obvious mechanism for aggregating multiple, potentially
166+
unresolvable, events.
167+
As datasets approach the population scale [citations]
168+
representing such uncertainty
169+
directly through the data structure is a useful alternative to
170+
classical methods based on probabilistic sampling.
129171
\end{reply}
130172

131173
\begin{point}
132-
"A key feature of the gARG encoding is that it enables these varying levels of precision to be represented, and brings these nuanced features to light." -the word nuanced feels strange here.
174+
``A key feature of the gARG encoding is that it enables these varying levels of
175+
precision to be represented, and brings these nuanced features to light." -the
176+
word nuanced feels strange here.
133177
\end{point}
134178
\begin{reply}
135179
We have deleted the second part of this sentence.
136180
\end{reply}
137181

138182
\begin{point}
139-
"Simpler representations can be formed by removing "unknowable" nodes (Fig. 5B)" -unknowable is vague here, do you mean bubbles along a single lineage?
183+
``Simpler representations can be formed by removing ``unknowable" nodes (Fig.
184+
5B)" -unknowable is vague here, do you mean bubbles along a single lineage?
140185
\end{point}
141186
\begin{reply}
142-
We've added a clarification that this refers to nodes such as those in singly-connected graph components.
187+
We've added a clarification that this refers to nodes such as those
188+
in singly-connected graph components.
143189
\end{reply}
144190

145191
\begin{point}
146-
"The gARG encoding leads to highly efficient storage and processing of ARG data, "-As gARG has various levels of precision, perhaps this needs to state that the "gARG encoding can lead to..." or be more precise that this is a reduced precision level.
192+
``The gARG encoding leads to highly efficient storage and processing of ARG
193+
data, "-As gARG has various levels of precision, perhaps this needs to state
194+
that the "gARG encoding can lead to..." or be more precise that this is a
195+
reduced precision level.
147196
\end{point}
148197
\begin{reply}
149-
Amended as suggested to add "can lead to".
198+
Amended as suggested to add ``can lead to".
150199
\end{reply}
151200

152201
\begin{point}
153-
"The succinct tree sequence data structure (usually known as a "tree sequence" for brevity) is a practical gARG implementation focused on efficiency." - If the tree sequence is focused at a particular level of gARG simplification be precise about this.
202+
``The succinct tree sequence data structure (usually known as a ``tree sequence"
203+
for brevity) is a practical gARG implementation focused on efficiency." - If
204+
the tree sequence is focused at a particular level of gARG simplification be
205+
precise about this.
154206
\end{point}
155207
\begin{reply}
156-
We have left this sentence as is, since the tree sequence structure can record gARGs at various levels of simplification.
208+
We have left this sentence as is, since the tree sequence structure
209+
can record gARGs at various levels of simplification.
157210
\end{reply}
158211

159212
\begin{point}
160-
"Methods targeting large-scale datasets tend to simplify the inference problem by making a single, deterministic best-guess " --I think this is the best guess of the topology, and the uncertainty in times given the ARG is downstream of this. If so please clarify. Also I'd perhaps explicitly acknowledge Deng et al (SINGER), e.g. "deterministic best-guess of the topology (see Deng et al for parallel developments addressing uncertainty with somewhat small sample sizes)" or something like that. While these deterministic approaches are a strong way forward for human biobank scale data, it's good to be highlight parallel developments that might be key to other applications.
213+
``Methods targeting large-scale datasets tend to simplify the inference problem
214+
by making a single, deterministic best-guess " --I think this is the best guess
215+
of the topology, and the uncertainty in times given the ARG is downstream of
216+
this. If so please clarify. Also I'd perhaps explicitly acknowledge Deng et al
217+
(SINGER), e.g. ``deterministic best-guess of the topology (see Deng et al for
218+
parallel developments addressing uncertainty with somewhat small sample sizes)"
219+
or something like that. While these deterministic approaches are a strong way
220+
forward for human biobank scale data, it's good to be highlight parallel
221+
developments that might be key to other applications.
161222
\end{point}
162223
\begin{reply}
163224
We have mentioned this as suggested.

0 commit comments

Comments
 (0)