Skip to content

Commit 7162ef9

Browse files
Fixup tests
1 parent d78b045 commit 7162ef9

File tree

2 files changed

+18
-6
lines changed

2 files changed

+18
-6
lines changed

docs/alignments_analysis.md

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -111,14 +111,14 @@ np.mean(gap_count)
111111
```
112112

113113
:::{warning}
114-
The arrays returned by the ``alignment`` are **zero based** and you
114+
The arrays returned by the ``alignment`` interface are **zero based** and you
115115
must compensate to use **one-based** coordinates.
116116
:::
117117

118118
If you want to access
119119
specific slices of the array based on **one-based** coordinates, it's important
120120
to take the zero-based nature of this into account. Suppose we wanted to
121-
access the first 10 bases of Spike for a give sample. The first
121+
access the first 10 bases of Spike for a given sample. The first
122122
base of Spike is 21563 in standard one-based coordinates. While we could do
123123
some arithmetic to compensate, the simplest way to translate is to simply
124124
prepend some value to the alignment array:
@@ -129,7 +129,6 @@ spike_start = 21_563
129129
a[spike_start: spike_start + 10]
130130
```
131131

132-
133132
(sec_alignments_analysis_data_encoding)=
134133

135134
## Alignment data encoding
@@ -182,3 +181,16 @@ strings, because -1 is interpreted as the last element of the list in Python. Pl
182181
use the {func}`decode_alleles` function to avoid this tripwire.
183182
:::
184183

184+
185+
## Accessing by variant
186+
187+
A unique feature of the VCF Zarr encoding used here is that we can efficiently access
188+
the alignment data by sample **and** by site. The best way to access data by site
189+
is to use the {meth}`Dataset.variants` method.
190+
191+
:::{note}
192+
The {meth}`Dataset.variants` method is deliberately designed to mirror the API
193+
of the corresponding [tskit](https://tskit.dev) function
194+
({meth}`tskit.TreeSequence.variants`).
195+
:::
196+

tests/test_inference.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -188,7 +188,7 @@ def test_match_reference(self, mirror):
188188
ts = tables.tree_sequence()
189189
alignment = sc2ts.data_import.get_reference_sequence(as_array=True)
190190
alignment[0] = "A"
191-
a = jit.encode_alignment(alignment)
191+
a = jit.encode_alleles(alignment)
192192
h = a[ts.sites_position.astype(int)]
193193
samples = [si.Sample("test", "2020-01-01", haplotype=h)]
194194
matches = self.match_tsinfer(samples, ts, mirror_coordinates=mirror)
@@ -205,7 +205,7 @@ def test_match_reference_one_mutation(self, mirror, site_id):
205205
ts = tables.tree_sequence()
206206
alignment = sc2ts.data_import.get_reference_sequence(as_array=True)
207207
alignment[0] = "A"
208-
a = jit.encode_alignment(alignment)
208+
a = jit.encode_alleles(alignment)
209209
h = a[ts.sites_position.astype(int)]
210210
samples = [si.Sample("test", "2020-01-01", haplotype=h)]
211211
# Mutate to gap
@@ -231,7 +231,7 @@ def test_match_reference_all_same(self, mirror, allele):
231231
ts = tables.tree_sequence()
232232
alignment = sc2ts.data_import.get_reference_sequence(as_array=True)
233233
alignment[0] = "A"
234-
a = jit.encode_alignment(alignment)
234+
a = jit.encode_alleles(alignment)
235235
ref = a[ts.sites_position.astype(int)]
236236
h = np.zeros_like(ref) + allele
237237
samples = [si.Sample("test", "2020-01-01", haplotype=h)]

0 commit comments

Comments
 (0)