Skip to content

Commit dc6f164

Browse files
authored
Merge pull request #25 from TravisWheelerLab/dev
nail v0.5.0 | libnail v0.5.0
2 parents b6291d3 + d4e3b88 commit dc6f164

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

46 files changed

+11940
-2565
lines changed

README.md

Lines changed: 22 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -180,7 +180,7 @@ Checking the `results.ali` will look something like:
180180

181181
### nail seeds
182182

183-
If you run `nail search --only-seed` command, nail will run MMseqs2, produce a `seeds.json` file, and terminate.
183+
If you run `nail search --only-seed` command, nail will run MMseqs2, produce a `seeds.tsv` file, and terminate.
184184
This may be useful if you would like to experiment with different nail settings using the same seeds.
185185

186186
For example:
@@ -189,43 +189,27 @@ For example:
189189

190190
You can also save the seeds from a full run of the `nail search` pipeline by supplying a `--seeds-out` argument:
191191

192-
$ nail search --seeds-out seeds.json query.hmm target.fa
193-
194-
Seeds can be provided to `nail search` using the `--seeds <seeds.json>` flag, which will skip the seed step in the search pipeline.
195-
196-
$ nail search --seeds seeds.json query.hmm target.fa
197-
198-
In practice, these seeds may be produced from any source as long as they are formatted in the following way:
199-
200-
```
201-
{
202-
"query1": {
203-
"target1": {
204-
"seq_start": 48, // <- these are the positions from which
205-
"seq_end": 287, // < nail will begin the cloud search
206-
"prf_start": 1, // <
207-
"prf_end": 259, // <
208-
"score": 168.0 // <--- the score field is used to pick between
209-
}, seeds that compete with each other
210-
"target2": {
211-
"seq_start": 72,
212-
"seq_end": 343,
213-
"prf_start": 23,
214-
"prf_end": 259,
215-
"score": 106.0
216-
},
217-
"query2": {
218-
"target3": {
219-
"seq_start": 56,
220-
"seq_end": 303,
221-
"prf_start": 1,
222-
"prf_end": 259,
223-
"score": 125.0
224-
},
225-
}
226-
...
227-
}
228-
```
192+
$ nail search --seeds-out seeds.tsv query.hmm target.fa
193+
194+
Seeds can be provided to `nail search` using the `--seeds <seeds.tsv>` flag, which will skip the seed step in the search pipeline.
195+
196+
$ nail search --seeds seeds.tsv query.hmm target.fa
197+
198+
In practice, these seeds may be produced from any source; the input file just needs to be a tsv of the following shape:
199+
200+
| query | target | query start | query end | target start | target end | score | E-value |
201+
|--------|----------|-------------|-----------|--------------|------------|-------|-----------|
202+
| query1 | target1 | 5 | 247 | 54 | 302 | 189 | 3.660E-55 |
203+
| query1 | target2 | 12 | 255 | 58 | 305 | 187 | 1.281E-54 |
204+
| query1 | target3 | 3 | 263 | 51 | 315 | 182 | 7.499E-53 |
205+
| query1 | target4 | 18 | 240 | 43 | 325 | 176 | 8.188E-51 |
206+
| query2 | target5 | 7 | 238 | 579 | 838 | 183 | 2.854E-53 |
207+
| query2 | target6 | 2 | 262 | 570 | 829 | 181 | 9.992E-53 |
208+
| query2 | target7 | 15 | 244 | 573 | 832 | 181 | 9.992E-53 |
209+
| query2 | target8 | 9 | 233 | 581 | 840 | 181 | 9.992E-53 |
210+
| query3 | target9 | 4 | 118 | 192 | 324 | 125 | 1.899E-35 |
211+
| query3 | target10 | 10 | 141 | 204 | 336 | 124 | 3.571E-35 |
212+
| query3 | target11 | 6 | 122 | 190 | 322 | 123 | 1.732E-34 |
229213

230214
We plan to make the use of custom seeds more robust in the future.
231215

fixtures/a.seeds

Lines changed: 411 additions & 0 deletions
Large diffs are not rendered by default.

fixtures/query.hmm

Lines changed: 1977 additions & 0 deletions
Large diffs are not rendered by default.

fixtures/target.fa

Lines changed: 4598 additions & 0 deletions
Large diffs are not rendered by default.

libnail/CHANGELOG.md

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,33 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1414
### Security
1515
-->
1616

17+
## [Unreleased]
18+
19+
## [0.5.0] - 2026-3-19
20+
21+
### Added
22+
- added struct `AmbiguityMap`
23+
- added traits `AminoUtilsUtf8` and `AminoUtilsDigital`
24+
- added enum `Transition`
25+
- added mod `profile::blosum62`
26+
- added struct `ProfileBuilder`
27+
- added function `Profile::from_blosum62_and_seq()`
28+
29+
### Changed
30+
- removed transition index constants under `Profile` namespace
31+
- renamed `Profile.forward_tau` and `Profile.forward_lambda` to `fwd_tau` and `fwd_lambda`
32+
- renamed `Profile.consensus_sequence_bytes_utf8` to `consensus_seq_bytes_utf8`
33+
- moved `Alphabet` and`AminoAcid` to mod `alphabet`
34+
- split self mutating functions from trait `VecMath` into new trait `VecMathMut`
35+
- struct `Seed` now has `prf`, `seq`, `e_value` fields (probably temporary)
36+
37+
### Removed
38+
- removed `Hmm` struct
39+
- removed function `Profile::new()`
40+
41+
### Fixed
42+
43+
- fixed boundary condition bugs in `forward()`, `backward()`, and `posterior()`
1744

1845
## [0.4.0] - 2025-6-18
1946

@@ -28,6 +55,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
2855
- added structs `Emission`, `CoreToCore`, `BackgroundLoop`, `CoreEntry`
2956
- added traits `MinAssign`, `MaxAssign` and blanket impls
3057
- added macro `assert_eq_pairs!`
58+
3159
### Changed
3260
- `Profile.match_scores` and `Profile.insert_scores` vectors combined into `Profile.emission_scores`
3361
- structs `Sequence`, `Profile` now have an ending pad position
@@ -36,6 +64,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
3664
- renamed struct `AntiDiagonalBounds` to `Cloud`
3765
- changed cloud search implementation to better match the Forward/Backward recurrence
3866
- renamed trait `VecUtils` to `VecMath`
67+
3968
### Removed
4069
- removed structs `CloudAntiDiagonal`, `CloudMatrixLinear`
4170

@@ -53,7 +82,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
5382
- the mean relative entropy (MRE) can now be raised or lowered
5483
- the algorithm for raising (MRE) has been improved
5584
- `util::avg_relative_entropy()` is now `mean_relative_entropy()`
56-
- parse_hmms_from_p7hmm_file(path) is now Hmm::from_p7hmm(buf)
85+
- `parse_hmms_from_p7hmm_file(path)` is now `Hmm::from_p7hmm(buf)`
5786
- refactored Alignment struct
5887
- fields are now grouped into sub-structs
5988
- refactored AlignmentBuilder for changes

libnail/Cargo.toml

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[package]
22
name = "libnail"
3-
version = "0.4.0"
3+
version = "0.5.0"
44
authors = ["Jack Roddy <jack.w.roddy@gmail.com>"]
55
edition = "2021"
66
license = "BSD-3-Clause"
@@ -21,12 +21,11 @@ phf = { version = "0.11", features = ["macros"] }
2121
regex = "1.7.0"
2222
anyhow = "1.0.66"
2323
thiserror = "1.0.37"
24-
serde = { version = "1.0", features = ["derive"] }
25-
serde_json = "1.0.93"
2624
lazy_static = "1.4.0"
2725
rand = "0.8.5"
2826
rand_pcg = "0.3.1"
2927
image = { version = "0.25.1" , optional = true }
28+
datasize = "0.2.15"
3029

3130
[dev-dependencies]
3231
ctor = "0.4.1"

libnail/src/align/backward.rs

Lines changed: 16 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
use crate::align::structs::{DpMatrix, RowBounds};
22
use crate::log_sum;
3+
use crate::structs::profile::Transition;
34
use crate::structs::{Profile, Sequence};
45
use crate::util::log_add;
56

@@ -61,7 +62,7 @@ pub fn backward(
6162
log_sum!(
6263
dp_matrix.get_special(row_bounds.seq_end, Profile::E_IDX),
6364
dp_matrix.get_delete(row_bounds.seq_end, profile_idx + 1)
64-
+ profile.transition_score(Profile::M_D_IDX, profile_idx)
65+
+ profile.transition_score(Transition::MD as usize, profile_idx)
6566
),
6667
);
6768

@@ -72,7 +73,7 @@ pub fn backward(
7273
log_sum!(
7374
dp_matrix.get_special(row_bounds.seq_end, Profile::E_IDX),
7475
dp_matrix.get_delete(row_bounds.seq_end, profile_idx + 1)
75-
+ profile.transition_score(Profile::D_D_IDX, profile_idx)
76+
+ profile.transition_score(Transition::DD as usize, profile_idx)
7677
),
7778
);
7879
}
@@ -89,7 +90,8 @@ pub fn backward(
8990
target_idx,
9091
Profile::B_IDX,
9192
dp_matrix.get_match(target_idx + 1, profile_start_on_current_row)
92-
+ profile.transition_score(Profile::B_M_IDX, profile_start_on_current_row - 1)
93+
+ profile
94+
.transition_score(Transition::BM as usize, profile_start_on_current_row - 1)
9395
+ profile.match_score(current_residue, profile_start_on_current_row),
9496
);
9597

@@ -102,7 +104,7 @@ pub fn backward(
102104
log_sum!(
103105
dp_matrix.get_special(target_idx, Profile::B_IDX),
104106
dp_matrix.get_match(target_idx + 1, profile_idx)
105-
+ profile.transition_score(Profile::B_M_IDX, profile_idx - 1)
107+
+ profile.transition_score(Transition::BM as usize, profile_idx - 1)
106108
+ profile.match_score(current_residue, profile_idx)
107109
),
108110
);
@@ -149,20 +151,20 @@ pub fn backward(
149151
dp_matrix.get_special(target_idx, Profile::E_IDX),
150152
);
151153

152-
for profile_idx in (profile_start_on_current_row..profile_end_on_current_row).rev() {
154+
for profile_idx in (profile_start_on_current_row..=profile_end_on_current_row).rev() {
153155
dp_matrix.set_match(
154156
target_idx,
155157
profile_idx,
156158
log_sum!(
157159
dp_matrix.get_match(target_idx + 1, profile_idx + 1)
158-
+ profile.transition_score(Profile::M_M_IDX, profile_idx)
160+
+ profile.transition_score(Transition::MM as usize, profile_idx)
159161
+ profile.match_score(current_residue, profile_idx + 1),
160162
dp_matrix.get_insert(target_idx + 1, profile_idx)
161-
+ profile.transition_score(Profile::M_I_IDX, profile_idx)
163+
+ profile.transition_score(Transition::MI as usize, profile_idx)
162164
+ profile.insert_score(current_residue, profile_idx),
163165
dp_matrix.get_special(target_idx, Profile::E_IDX),
164166
dp_matrix.get_delete(target_idx, profile_idx + 1)
165-
+ profile.transition_score(Profile::M_D_IDX, profile_idx)
167+
+ profile.transition_score(Transition::MD as usize, profile_idx)
166168
),
167169
);
168170

@@ -171,10 +173,10 @@ pub fn backward(
171173
profile_idx,
172174
log_sum!(
173175
dp_matrix.get_match(target_idx + 1, profile_idx + 1)
174-
+ profile.transition_score(Profile::I_M_IDX, profile_idx)
176+
+ profile.transition_score(Transition::IM as usize, profile_idx)
175177
+ profile.match_score(current_residue, profile_idx + 1),
176178
dp_matrix.get_insert(target_idx + 1, profile_idx)
177-
+ profile.transition_score(Profile::I_I_IDX, profile_idx)
179+
+ profile.transition_score(Transition::II as usize, profile_idx)
178180
+ profile.insert_score(current_residue, profile_idx)
179181
),
180182
);
@@ -184,10 +186,10 @@ pub fn backward(
184186
profile_idx,
185187
log_sum!(
186188
dp_matrix.get_match(target_idx + 1, profile_idx + 1)
187-
+ profile.transition_score(Profile::D_M_IDX, profile_idx)
189+
+ profile.transition_score(Transition::DM as usize, profile_idx)
188190
+ profile.match_score(current_residue, profile_idx + 1),
189191
dp_matrix.get_delete(target_idx, profile_idx + 1)
190-
+ profile.transition_score(Profile::D_D_IDX, profile_idx),
192+
+ profile.transition_score(Transition::DD as usize, profile_idx),
191193
dp_matrix.get_special(target_idx, Profile::E_IDX)
192194
),
193195
);
@@ -203,7 +205,7 @@ pub fn backward(
203205
row_bounds.seq_start - 1,
204206
Profile::B_IDX,
205207
dp_matrix.get_match(row_bounds.seq_start, profile_start_in_first_row)
206-
+ profile.transition_score(Profile::B_M_IDX, 0)
208+
+ profile.transition_score(Transition::BM as usize, 0)
207209
+ profile.match_score(first_target_character, 1),
208210
);
209211

@@ -214,17 +216,12 @@ pub fn backward(
214216
log_sum!(
215217
dp_matrix.get_special(row_bounds.seq_start - 1, Profile::B_IDX),
216218
dp_matrix.get_match(row_bounds.seq_start, profile_idx)
217-
+ profile.transition_score(Profile::B_M_IDX, profile_idx - 1)
219+
+ profile.transition_score(Transition::BM as usize, profile_idx - 1)
218220
+ profile.match_score(first_target_character, profile_idx)
219221
),
220222
);
221223
}
222224

223-
// dp_matrix.set_special(
224-
// row_bounds.target_start - 1,
225-
// Profile::SPECIAL_J_IDX,
226-
// -f32::INFINITY,
227-
// );
228225
dp_matrix.set_special(row_bounds.seq_start - 1, Profile::C_IDX, -f32::INFINITY);
229226
dp_matrix.set_special(row_bounds.seq_start - 1, Profile::E_IDX, -f32::INFINITY);
230227
dp_matrix.set_special(

libnail/src/align/cloud_search.rs

Lines changed: 26 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,20 @@
1-
use crate::align::structs::{Cloud, Seed};
2-
use crate::log_sum;
3-
use crate::max_f32;
4-
use crate::structs::profile::{AminoAcid, BackgroundLoop, CoreToCore, Emission};
5-
use crate::structs::{Profile, Sequence};
6-
use crate::util::{log_add, MaxAssign};
7-
8-
use super::structs::{Ad, BackgroundState::*, Bound, Cell, CloudMatrix, CoreState::*, NewDpMatrix};
9-
use super::Nats;
1+
use std::fmt::Display;
2+
3+
use crate::{
4+
align::structs::{Cloud, Seed},
5+
alphabet::AminoAcid,
6+
log_sum, max_f32,
7+
structs::{
8+
profile::{BackgroundLoop, CoreToCore, Emission, Transition},
9+
Profile, Sequence,
10+
},
11+
util::{log_add, MaxAssign},
12+
};
13+
14+
use super::{
15+
structs::{Ad, BackgroundState::*, Bound, Cell, CloudMatrix, CoreState::*, NewDpMatrix},
16+
Nats,
17+
};
1018

1119
#[derive(Clone)]
1220
pub struct CloudSearchParams {
@@ -15,6 +23,12 @@ pub struct CloudSearchParams {
1523
pub beta: f32,
1624
}
1725

26+
impl Display for CloudSearchParams {
27+
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
28+
write!(f, "γ{} α{} β{}", self.gamma, self.alpha, self.beta)
29+
}
30+
}
31+
1832
impl Default for CloudSearchParams {
1933
fn default() -> Self {
2034
CloudSearchParams {
@@ -106,7 +120,7 @@ pub fn compute_backward_cells<M>(
106120
mx[(B, seq_idx)] = log_sum!(
107121
mx[(B, seq_idx)],
108122
mx[m_src_cell]
109-
+ prf.transition_score(Profile::B_M_IDX, prf_idx)
123+
+ prf.transition_score(Transition::BM as usize, prf_idx)
110124
+ prf[Emission(m_src, residue)]
111125
);
112126
}
@@ -230,7 +244,7 @@ pub fn compute_forward_cells<M>(
230244
mx[m_src_cell] + prf[CoreToCore(m_src, m_dest)],
231245
mx[i_src_cell] + prf[CoreToCore(i_src, m_dest)],
232246
mx[d_src_cell] + prf[CoreToCore(d_src, m_dest)],
233-
mx[(B, seq_idx - 1)] + prf.transition_score(Profile::B_M_IDX, prf_idx - 1)
247+
mx[(B, seq_idx - 1)] + prf.transition_score(Transition::BM as usize, prf_idx - 1)
234248
) + prf[Emission(m_dest, residue)];
235249

236250
// insert state
@@ -263,7 +277,7 @@ pub fn compute_forward_cells<M>(
263277
mx[d_src_cell] + prf[CoreToCore(d_src, d_dest)]
264278
);
265279

266-
mx[(E, seq_idx)] = log_sum!(mx[m_cell], mx[d_cell], mx[(E, seq_idx)])
280+
mx[(E, seq_idx)] = log_sum!(mx[m_cell], mx[d_cell], mx[(E, seq_idx)]);
267281
}
268282

269283
pub fn cloud_search_fwd<M>(

0 commit comments

Comments
 (0)