|
| 1 | +# `filter` and `map` with `all` on a `HashSet` |
| 2 | + |
| 3 | +```rust |
| 4 | +use std::collections::HashSet; |
| 5 | + |
| 6 | +pub fn check(candidate: &str) -> bool { |
| 7 | + let mut hs = HashSet::new(); |
| 8 | + candidate |
| 9 | + .bytes() |
| 10 | + .filter(|c| c.is_ascii_alphabetic()) |
| 11 | + .map(|c| c.to_ascii_lowercase()) |
| 12 | + .all(|c| hs.insert(c)) |
| 13 | +} |
| 14 | +``` |
| 15 | + |
| 16 | +With this approach you will instantiate and update a [`HashSet`][hashset] to keep track of the used letters. |
| 17 | + |
| 18 | +A [`use`][use] declaration allows directly calling `HashSet` instead of calling it with its entire [namespace][namespaces]. |
| 19 | +Without the `use` declaration, the `HashSet` would be instantiated like so |
| 20 | + |
| 21 | +```rust |
| 22 | +let mut hs = std::collections::HashSet::new(); |
| 23 | +``` |
| 24 | + |
| 25 | +After the `HashSet` is instantiated, a series of functions are chained from the `candidate` `&str`. |
| 26 | +- Since all of the characters are [ASCII][ascii], they can be iterated with the [`bytes`][bytes] method. |
| 27 | +Each byte is iterated as a [`u8`][u8], which is an unsigned 8-bit integer. |
| 28 | +- The [`filter`][filter] method [borrows][borrow] each byte as a [reference][reference] to a `u8` (`&u8`). |
| 29 | +Inside of its [closure][closure] it tests each byte to see if it [`is_ascii_alphabetic`][is-ascii-alphabetic]. |
| 30 | +Only bytes which are ASCII letters will survive the `filter` to be passed on to the [`map`][map] method. |
| 31 | +- The `map` method calls [`to_ascii_lowercase`][to-ascii-lowercase] on each byte. |
| 32 | +- Each lowercased byte is then tested by the [`all`][all] method by using the [`insert`][insert] method of `HashSet`. |
| 33 | +`all` will return `true` if every call to `insert` returns true. |
| 34 | +If a call to `insert` returns `false` then `all` will "short-circuit" and immediately return `false`. |
| 35 | +The `insert` method returns whether the value is _newly_ inserted. |
| 36 | +So, for the word `"alpha"`, `insert` will return `true` when the first `a` is inserted, |
| 37 | +but will return `false` when the second `a` is inserted. |
| 38 | + |
| 39 | +## Refactoring |
| 40 | + |
| 41 | +## using the `str` method [to_ascii_lowercase][str-to-ascii-lowercase] and no `map` |
| 42 | + |
| 43 | +You might want to to call the `str` method [to_ascii_lowercase][str-to-ascii-lowercase] and save calling `map`, |
| 44 | +like so |
| 45 | + |
| 46 | +```rust |
| 47 | +candidate |
| 48 | + .to_ascii_lowercase() |
| 49 | + .bytes() |
| 50 | + .filter(|c| c.is_ascii_alphabetic()) |
| 51 | + .all(|c| hs.insert(c)) |
| 52 | +``` |
| 53 | + |
| 54 | +However, changing the case of all characters in a `str` raised the average benchmark a few nanoseconds. |
| 55 | +It is a bit faster to `filter` out non-ASCII letters and to change the case of each surviving byte. |
| 56 | +Since the performance is fairly close, either may be prefered. |
| 57 | + |
| 58 | +### using `filter_map` |
| 59 | + |
| 60 | +Since `filter` and `map` are used, this approach could be refactored using the [`filter_map`][filter-map] method. |
| 61 | + |
| 62 | +```rust |
| 63 | +use std::collections::HashSet; |
| 64 | + |
| 65 | +pub fn check(candidate: &str) -> bool { |
| 66 | + let mut hs = HashSet::new(); |
| 67 | + candidate |
| 68 | + .bytes() |
| 69 | + .filter_map(|c| c.is_ascii_alphabetic().then(|| c.to_ascii_lowercase())) |
| 70 | + .all(|c| hs.insert(c)) |
| 71 | +} |
| 72 | +``` |
| 73 | + |
| 74 | +By chaining the [`then`][then] method to the result of `is_ascii_alphabetic`, |
| 75 | +and calling `to_ascii_lowercase` in the closure for `then`, |
| 76 | +the `filter map` passes only lowercased ASCII letter bytes to the `all` method. |
| 77 | +In benchmarking, this approach was slightly slower, but its style may be prefered. |
| 78 | + |
| 79 | +### supporting [Unicode][unicode] |
| 80 | + |
| 81 | +By substituting the [`chars`][chars] method for the `bytes` method, |
| 82 | +and by using the [`char`][char] methods [`is_alphabetic`][is-alphabetic] and [`to_lowercase`][char-to-lowercase], |
| 83 | +this approach can support Unicode characters. |
| 84 | + |
| 85 | +```rust |
| 86 | +use std::collections::HashSet; |
| 87 | + |
| 88 | +pub fn check(candidate: &str) -> bool { |
| 89 | + let mut hs = std::collections::HashSet::new(); |
| 90 | + candidate |
| 91 | + .chars() |
| 92 | + .filter(|c| c.is_alphabetic()) |
| 93 | + .map(|c| c.to_lowercase().to_string()) |
| 94 | + .all(|c| hs.insert(c)) |
| 95 | +} |
| 96 | +``` |
| 97 | + |
| 98 | +Usually an approach that supports Unicode will be slower than one that supports only bytes. |
| 99 | +However the benchmark for this approach was significantly slower, taking more than twice as long as the bytes approach. |
| 100 | +It can be further refactored to use the `str` [to_lowercase][str-to-lowercase] method and remove the `map` method |
| 101 | +to cut the benchmark down closer to the byte approach. |
| 102 | + |
| 103 | +```rust |
| 104 | +use std::collections::HashSet; |
| 105 | + |
| 106 | +pub fn check(candidate: &str) -> bool { |
| 107 | + let mut hs = std::collections::HashSet::new(); |
| 108 | + candidate |
| 109 | + .to_lowercase() |
| 110 | + .chars() |
| 111 | + .filter(|c| c.is_alphabetic()) |
| 112 | + .all(|c| hs.insert(c)) |
| 113 | +} |
| 114 | +``` |
| 115 | + |
| 116 | +To more completely support Unicode, an external crate, such as [unicode-segmentation][unicode-segmentation], |
| 117 | +could be used. |
| 118 | +This is becasue the [std::char][char] can not fully handle things such as [grapheme clusters][grapheme-clusters]. |
| 119 | + |
| 120 | +[hashset]: https://doc.rust-lang.org/std/collections/struct.HashSet.html |
| 121 | +[use]: https://doc.rust-lang.org/reference/items/use-declarations.html |
| 122 | +[namespaces]: https://doc.rust-lang.org/reference/names/namespaces.html |
| 123 | +[ascii]: https://www.asciitable.com/ |
| 124 | +[bytes]: https://doc.rust-lang.org/std/primitive.str.html#method.bytes |
| 125 | +[u8]: https://doc.rust-lang.org/std/primitive.u8.html |
| 126 | +[filter]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.filter |
| 127 | +[closure]: https://doc.rust-lang.org/rust-by-example/fn/closures.html |
| 128 | +[borrow]: https://doc.rust-lang.org/rust-by-example/scope/borrow.html |
| 129 | +[reference]: https://doc.rust-lang.org/std/primitive.reference.html |
| 130 | +[is-ascii-alphabetic]: https://doc.rust-lang.org/std/primitive.u8.html#method.is_ascii_alphabetic |
| 131 | +[map]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.map |
| 132 | +[to-ascii-lowercase]: https://doc.rust-lang.org/std/primitive.u8.html#method.to_ascii_lowercase |
| 133 | +[all]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.all |
| 134 | +[insert]: https://doc.rust-lang.org/std/collections/struct.HashSet.html#method.insert |
| 135 | +[str-to-ascii-lowercase]: https://doc.rust-lang.org/std/primitive.str.html#method.to_ascii_lowercase |
| 136 | +[filter-map]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.filter_map |
| 137 | +[then]: https://doc.rust-lang.org/core/primitive.bool.html#method.then |
| 138 | +[chars]: https://doc.rust-lang.org/core/primitive.str.html#method.chars |
| 139 | +[char]: https://doc.rust-lang.org/std/primitive.char.html |
| 140 | +[is-alphabetic]: https://doc.rust-lang.org/core/primitive.char.html#method.is_alphabetic |
| 141 | +[char-to-lowercase]: https://doc.rust-lang.org/core/primitive.char.html#method.to_lowercase |
| 142 | +[str-to-lowercase]: https://doc.rust-lang.org/std/primitive.str.html#method.to_lowercase |
| 143 | +[unicode]: https://en.wikipedia.org/wiki/Unicode |
| 144 | +[unicode-segmentation]: https://crates.io/crates/unicode-segmentation |
| 145 | +[char]: https://doc.rust-lang.org/std/primitive.char.html |
| 146 | +[grapheme-clusters]: https://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries |
0 commit comments