Introduce a more flexible overlapping iterator API#102
Introduce a more flexible overlapping iterator API#102aneubeck wants to merge 2 commits intodaac-tools:mainfrom
Conversation
|
@aneubeck We have considered it, but find it unnecessary to implement this feature. impl Iterator for MyIterator {
fn next(&mut self) -> Option<u8> {
let c = ...;
return c;
}
}
let it = MyIterator { ... };
for m in pma.find_overlapping_iter_from_iter(it) {
} |
|
Thanks for taking a look! Since we have another crate (https://github.com/github/rust-gems/tree/main/crates/bpe) which requires this functionality, we are publishing a fork to crates.io so that we can depend on it as a work-around: https://github.com/aneubeck/daachorse/pull/1/files. Let us know if you have any concerns or if you have a proposal how we could get our use case implemented on top of main. |
|
@aneubeck How about the following implementation? use std::cell::RefCell;
use daachorse::DoubleArrayAhoCorasick;
struct RefCellIter<'a>(&'a RefCell<Option<u8>>);
impl Iterator for RefCellIter<'_> {
type Item = u8;
fn next(&mut self) -> Option<Self::Item> {
self.0.borrow_mut().take()
}
}
fn main() {
let next_char = RefCell::new(None);
let pma = DoubleArrayAhoCorasick::<u32>::new(&["a", "abcd", "ab", "bc"]).unwrap();
let mut it = pma.find_overlapping_iter_from_iter(
b"ab".iter().cloned().chain(RefCellIter(&next_char)),
);
println!("iterates 'ab'");
for m in &mut it {
println!("{m:?}");
}
println!("consumes 'c'");
next_char.borrow_mut().replace(b'c');
for m in &mut it {
println!("{m:?}");
}
println!("consumes 'd'");
next_char.borrow_mut().replace(b'd');
for m in &mut it {
println!("{m:?}");
}
} |
|
Thanks for the proposal! The challenge is that we want to be able to go back to an earlier snapshot of the processing which means we need to somehow recover also now the state of the aho-corasick iterator. I know it's not great to leak such an implementation detail. |
Please let me know if you would be willing to accept such a change.
The existing overlapping iterator implementation could be simplified with this API as well.