Skip to content

Commit e76c930

Browse files
isogram: add approaches (#1580)
isogram: add approaches Co-authored-by: Erik Schierboom <[email protected]>
1 parent c42cb82 commit e76c930

File tree

12 files changed

+679
-0
lines changed

12 files changed

+679
-0
lines changed
Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
# Bit field used functionally
2+
3+
```rust
4+
const A_LCASE: u8 = 97;
5+
6+
pub fn check(candidate: &str) -> bool {
7+
candidate
8+
.bytes()
9+
.filter_map(|c| {
10+
c.is_ascii_alphabetic()
11+
.then(|| 1u32 << (c.to_ascii_lowercase() - A_LCASE))
12+
})
13+
.try_fold(0u32, |ltr_flags, ltr| {
14+
(ltr_flags & ltr == 0).then(|| ltr_flags | ltr)
15+
})
16+
.is_some()
17+
}
18+
```
19+
20+
This solution uses the [ASCII][ascii] value of the letter to set the corresponding bit position.
21+
22+
First, a [`const`][const] value is set with the ASCII value for `a`.
23+
24+
- Since all of the characters are [ASCII][ascii], they can be iterated with the [`bytes`][bytes] method.
25+
Each byte is iterated as a [`u8`][u8], which is an unsigned 8-bit integer, and is passed to the [filter_map][filter-map] method.
26+
- The [closure][closure] inside `filter_map` first tests if the byte [is_ascii_alphabetic][is-ascii-alphabetic].
27+
If so, the byte is passed to the [`then`][then] method, where the byte is set [`to_ascii_lowercase`][to-ascii-lowercase] in its closure.
28+
To understand what else is happening in `then`, consider the following:
29+
30+
- If the lower-cased letter is subtracted by `a`, then `a` will result in `0`, because `97` minus `97` equals `0`.
31+
`z` would result in `25`, because `122` minus `97` equals `25`.
32+
So `a` would have `1u32` [shifted left][shift-left] 0 places (so not shifted at all) and `z` would have `1` shifted left 25 places.
33+
34+
So, for the [unsigned thirty-two bit integer][u32] (`1u32`), the value for `a` would look like
35+
36+
```
37+
zyxwvutsrqponmlkjihgfedcba
38+
00000000000000000000000000000001
39+
```
40+
41+
and the value for `z` would look like
42+
43+
```
44+
zyxwvutsrqponmlkjihgfedcba
45+
00000010000000000000000000000000
46+
```
47+
48+
- The `filter map` passes only lowercased ASCII letter bytes converted to `u32` values to the [try_fold][try-fold] method.
49+
50+
```rust
51+
// code snipped
52+
.try_fold(0u32, |ltr_flags, ltr| {
53+
(ltr_flags & ltr == 0).then(|| ltr_flags | ltr)
54+
})
55+
.is_some()
56+
}
57+
```
58+
59+
- The `try_fold` has its accumulator set to a `u32` initialized to `0`.
60+
The closure inside `try_fold` uses the [bitwise AND operator][and] to check if the bit for the letter position has not already been set.
61+
- If it has been set, you know the letter is duplicated and `try_fold` will "short circuit"
62+
and immediately pass [`None`][none] to the [`is_some`][is-some] method, which willl return `false`.
63+
- If it has not been set, the [bitwise OR operator][or] is used in the `then` method to set the bit.
64+
If all of the iterations of `try_fold` complete without finding a duplicate letter (and returning `None`),
65+
the function returns `true` from the `is_some` method.
66+
67+
## Refactoring
68+
69+
Since a `filter_map` is used, this approach could be refactored to use [`filter`][filter] and a [`map`][map].
70+
71+
```rust
72+
pub fn check_bits(candidate: &str) -> bool {
73+
candidate
74+
.bytes()
75+
.filter(|c| c.is_ascii_alphabetic())
76+
.map(|c| 1u32 << (c.to_ascii_lowercase() - A_LCASE))
77+
.try_fold(0u32, |ltr_flags, ltr| {
78+
(ltr_flags & ltr == 0).then(|| ltr_flags | ltr)
79+
})
80+
.is_some()
81+
}
82+
```
83+
84+
In benchmarking, this approach was slightly slower, but its style may be prefered.
85+
86+
[ascii]: https://www.asciitable.com/
87+
[const]: https://doc.rust-lang.org/std/keyword.const.html
88+
[bytes]: https://doc.rust-lang.org/std/primitive.str.html#method.bytes
89+
[u8]: https://doc.rust-lang.org/std/primitive.u8.html
90+
[filter-map]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.filter_map
91+
[closure]: https://doc.rust-lang.org/rust-by-example/fn/closures.html
92+
[is-ascii-alphabetic]: https://doc.rust-lang.org/std/primitive.u8.html#method.is_ascii_alphabetic
93+
[then]: https://doc.rust-lang.org/core/primitive.bool.html#method.then
94+
[to-ascii-lowercase]: https://doc.rust-lang.org/std/primitive.u8.html#method.to_ascii_lowercase
95+
[u32]: https://doc.rust-lang.org/std/primitive.u32.html
96+
[try-fold]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.try_fold
97+
[shift-left]: https://doc.rust-lang.org/std/ops/trait.Shl.html
98+
[and]: https://doc.rust-lang.org/std/ops/trait.BitAnd.html
99+
[none]: https://doc.rust-lang.org/std/option/enum.Option.html#variant.None
100+
[is-some]: https://doc.rust-lang.org/std/option/enum.Option.html#method.is_some
101+
[or]: https://doc.rust-lang.org/std/ops/trait.BitOr.html
102+
[filter]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.filter
103+
[map]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.map
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
candidate.bytes()
2+
.filter_map(|c| {
3+
c.is_ascii_alphabetic()
4+
.then(|| 1u32 << (c.to_ascii_lowercase() - A_LCASE))
5+
})
6+
.try_fold(0u32, |ltr_flags, ltr| {
7+
(ltr_flags & ltr == 0).then(|| ltr_flags | ltr)
8+
}).is_some()
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
# Bit field using a `for` loop
2+
3+
```rust
4+
const A_LCASE: u8 = 97;
5+
const Z_LCASE: u8 = 122;
6+
const A_UCASE: u8 = 65;
7+
const Z_UCASE: u8 = 90;
8+
9+
pub fn check(candidate: &str) -> bool {
10+
let mut letter_flags: u32 = 0;
11+
12+
for letter in candidate.bytes() {
13+
if letter >= A_LCASE && letter <= Z_LCASE {
14+
if letter_flags & (1 << (letter - A_LCASE)) != 0 {
15+
return false;
16+
} else {
17+
letter_flags |= 1 << (letter - A_LCASE);
18+
}
19+
} else if letter >= A_UCASE && letter <= Z_UCASE {
20+
if letter_flags & (1 << (letter - A_UCASE)) != 0 {
21+
return false;
22+
} else {
23+
letter_flags |= 1 << (letter - A_UCASE);
24+
}
25+
}
26+
}
27+
return true;
28+
}
29+
```
30+
31+
This solution uses the [ASCII][ascii] value of the letter to set the corresponding bit position.
32+
33+
First, some [`const`][const] values are set.
34+
These values will be used for readability in the body of the `check` function.
35+
36+
An unsigned 32-bit integer ([`u32`][u32]) will be used to hold the twenty-six bits needed
37+
to keep track of the letters in the English alphabet.
38+
39+
The [`for` loop][for-loop] loops through the [bytes][bytes] of `candidate`.
40+
Each `letter` is a [`u8`][u8] which is tested for being `a` through `z` or `A` through `Z`.
41+
The ASCII values defined as the `const` values are used for that.
42+
The ASCII value for `a` is `97`, and for `z` is `122`.
43+
The ASCII value for `A` is `65`, and for `Z` is `90`.
44+
45+
- If the lower-cased letter is subtracted by `a`, then `a` will result in `0`, because `97` minus `97` equals `0`.
46+
`z` would result in `25`, because `122` minus `97` equals `25`.
47+
So `a` would have `1` [shifted left][shift-left] 0 places (so not shifted at all) and `z` would have `1` shifted left 25 places.
48+
- If the upper-cased letter is subtracted by `A`, then `A` will result in `0`, because `65` minus `65` equals `0`.
49+
`Z` would result in `25`, because `90` minus `65` equals `25`.
50+
So `A` would have `1` [shifted left][shift-left] 0 places (so not shifted at all) and `Z` would have `1` shifted left 25 places.
51+
52+
In that way, both a lower-cased `z` and an upper-cased `Z` can share the same position in the bit field.
53+
54+
So, for an unsigned thirty-two bit integer, if the values for `a` and `Z` were both set, the bits would look like
55+
56+
```
57+
zyxwvutsrqponmlkjihgfedcba
58+
00000010000000000000000000000001
59+
```
60+
61+
You can use the [bitwise AND operator][and] to check if a bit has already been set.
62+
If it has been set, you know the letter is duplicated and you can immediately return `false`.
63+
If it has not been set, you can use the [bitwise OR operator][or] to set the bit.
64+
If the loop completes without finding a duplicate letter (and returning `false`), the function returns `true`.
65+
66+
[ascii]: https://www.asciitable.com/
67+
[const]: https://doc.rust-lang.org/std/keyword.const.html
68+
[u32]: https://doc.rust-lang.org/std/primitive.u32.html
69+
[for-loop]: https://doc.rust-lang.org/reference/expressions/loop-expr.html#iterator-loops
70+
[bytes]: https://doc.rust-lang.org/std/primitive.str.html#method.bytes
71+
[u8]: https://doc.rust-lang.org/std/primitive.u8.html
72+
[shift-left]: https://doc.rust-lang.org/std/ops/trait.Shl.html
73+
[and]: https://doc.rust-lang.org/std/ops/trait.BitAnd.html
74+
[or]: https://doc.rust-lang.org/std/ops/trait.BitOr.html
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
let mut letter_flags: u32 = 0;
2+
3+
for letter in candidate.bytes() {
4+
if letter >= A_LCASE && letter <= Z_LCASE {
5+
if letter_flags & (1 << (letter - A_LCASE)) != 0 {
6+
return false;
7+
} else {
8+
letter_flags |= 1 << (letter - A_LCASE);
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
{
2+
"introduction": {
3+
"authors": ["bobahop"]
4+
},
5+
"approaches": [
6+
{
7+
"uuid": "345b3d89-55e6-4723-9f9f-9507c416207e",
8+
"slug": "filter-all",
9+
"title": "Filter with All on a HashSet",
10+
"blurb": "Use Filter with All on a HashSet to return the answer.",
11+
"authors": ["bobahop"]
12+
},
13+
{
14+
"uuid": "ee6b5efb-7780-45f3-ae85-9a6ba4142e57",
15+
"slug": "bitfied",
16+
"title": "Bit field using a for loop",
17+
"blurb": "Use a bit field with a for loop to keep track of used letters.",
18+
"authors": ["bobahop"]
19+
},
20+
{
21+
"uuid": "2a1c6080-2dfc-4d5b-841e-d26b22e1d061",
22+
"slug": "bitfield-functionally",
23+
"title": "Bit field used functionally",
24+
"blurb": "Use a bit field functionally to keep track of used letters.",
25+
"authors": ["bobahop"]
26+
}
27+
]
28+
}
Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
# `filter` and `map` with `all` on a `HashSet`
2+
3+
```rust
4+
use std::collections::HashSet;
5+
6+
pub fn check(candidate: &str) -> bool {
7+
let mut hs = HashSet::new();
8+
candidate
9+
.bytes()
10+
.filter(|c| c.is_ascii_alphabetic())
11+
.map(|c| c.to_ascii_lowercase())
12+
.all(|c| hs.insert(c))
13+
}
14+
```
15+
16+
With this approach you will instantiate and update a [`HashSet`][hashset] to keep track of the used letters.
17+
18+
A [`use`][use] declaration allows directly calling `HashSet` instead of calling it with its entire [namespace][namespaces].
19+
Without the `use` declaration, the `HashSet` would be instantiated like so
20+
21+
```rust
22+
let mut hs = std::collections::HashSet::new();
23+
```
24+
25+
After the `HashSet` is instantiated, a series of functions are chained from the `candidate` `&str`.
26+
- Since all of the characters are [ASCII][ascii], they can be iterated with the [`bytes`][bytes] method.
27+
Each byte is iterated as a [`u8`][u8], which is an unsigned 8-bit integer.
28+
- The [`filter`][filter] method [borrows][borrow] each byte as a [reference][reference] to a `u8` (`&u8`).
29+
Inside of its [closure][closure] it tests each byte to see if it [`is_ascii_alphabetic`][is-ascii-alphabetic].
30+
Only bytes which are ASCII letters will survive the `filter` to be passed on to the [`map`][map] method.
31+
- The `map` method calls [`to_ascii_lowercase`][to-ascii-lowercase] on each byte.
32+
- Each lowercased byte is then tested by the [`all`][all] method by using the [`insert`][insert] method of `HashSet`.
33+
`all` will return `true` if every call to `insert` returns true.
34+
If a call to `insert` returns `false` then `all` will "short-circuit" and immediately return `false`.
35+
The `insert` method returns whether the value is _newly_ inserted.
36+
So, for the word `"alpha"`, `insert` will return `true` when the first `a` is inserted,
37+
but will return `false` when the second `a` is inserted.
38+
39+
## Refactoring
40+
41+
## using the `str` method [to_ascii_lowercase][str-to-ascii-lowercase] and no `map`
42+
43+
You might want to to call the `str` method [to_ascii_lowercase][str-to-ascii-lowercase] and save calling `map`,
44+
like so
45+
46+
```rust
47+
candidate
48+
.to_ascii_lowercase()
49+
.bytes()
50+
.filter(|c| c.is_ascii_alphabetic())
51+
.all(|c| hs.insert(c))
52+
```
53+
54+
However, changing the case of all characters in a `str` raised the average benchmark a few nanoseconds.
55+
It is a bit faster to `filter` out non-ASCII letters and to change the case of each surviving byte.
56+
Since the performance is fairly close, either may be prefered.
57+
58+
### using `filter_map`
59+
60+
Since `filter` and `map` are used, this approach could be refactored using the [`filter_map`][filter-map] method.
61+
62+
```rust
63+
use std::collections::HashSet;
64+
65+
pub fn check(candidate: &str) -> bool {
66+
let mut hs = HashSet::new();
67+
candidate
68+
.bytes()
69+
.filter_map(|c| c.is_ascii_alphabetic().then(|| c.to_ascii_lowercase()))
70+
.all(|c| hs.insert(c))
71+
}
72+
```
73+
74+
By chaining the [`then`][then] method to the result of `is_ascii_alphabetic`,
75+
and calling `to_ascii_lowercase` in the closure for `then`,
76+
the `filter map` passes only lowercased ASCII letter bytes to the `all` method.
77+
In benchmarking, this approach was slightly slower, but its style may be prefered.
78+
79+
### supporting [Unicode][unicode]
80+
81+
By substituting the [`chars`][chars] method for the `bytes` method,
82+
and by using the [`char`][char] methods [`is_alphabetic`][is-alphabetic] and [`to_lowercase`][char-to-lowercase],
83+
this approach can support Unicode characters.
84+
85+
```rust
86+
use std::collections::HashSet;
87+
88+
pub fn check(candidate: &str) -> bool {
89+
let mut hs = std::collections::HashSet::new();
90+
candidate
91+
.chars()
92+
.filter(|c| c.is_alphabetic())
93+
.map(|c| c.to_lowercase().to_string())
94+
.all(|c| hs.insert(c))
95+
}
96+
```
97+
98+
Usually an approach that supports Unicode will be slower than one that supports only bytes.
99+
However the benchmark for this approach was significantly slower, taking more than twice as long as the bytes approach.
100+
It can be further refactored to use the `str` [to_lowercase][str-to-lowercase] method and remove the `map` method
101+
to cut the benchmark down closer to the byte approach.
102+
103+
```rust
104+
use std::collections::HashSet;
105+
106+
pub fn check(candidate: &str) -> bool {
107+
let mut hs = std::collections::HashSet::new();
108+
candidate
109+
.to_lowercase()
110+
.chars()
111+
.filter(|c| c.is_alphabetic())
112+
.all(|c| hs.insert(c))
113+
}
114+
```
115+
116+
To more completely support Unicode, an external crate, such as [unicode-segmentation][unicode-segmentation],
117+
could be used.
118+
This is becasue the [std::char][char] can not fully handle things such as [grapheme clusters][grapheme-clusters].
119+
120+
[hashset]: https://doc.rust-lang.org/std/collections/struct.HashSet.html
121+
[use]: https://doc.rust-lang.org/reference/items/use-declarations.html
122+
[namespaces]: https://doc.rust-lang.org/reference/names/namespaces.html
123+
[ascii]: https://www.asciitable.com/
124+
[bytes]: https://doc.rust-lang.org/std/primitive.str.html#method.bytes
125+
[u8]: https://doc.rust-lang.org/std/primitive.u8.html
126+
[filter]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.filter
127+
[closure]: https://doc.rust-lang.org/rust-by-example/fn/closures.html
128+
[borrow]: https://doc.rust-lang.org/rust-by-example/scope/borrow.html
129+
[reference]: https://doc.rust-lang.org/std/primitive.reference.html
130+
[is-ascii-alphabetic]: https://doc.rust-lang.org/std/primitive.u8.html#method.is_ascii_alphabetic
131+
[map]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.map
132+
[to-ascii-lowercase]: https://doc.rust-lang.org/std/primitive.u8.html#method.to_ascii_lowercase
133+
[all]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.all
134+
[insert]: https://doc.rust-lang.org/std/collections/struct.HashSet.html#method.insert
135+
[str-to-ascii-lowercase]: https://doc.rust-lang.org/std/primitive.str.html#method.to_ascii_lowercase
136+
[filter-map]: https://doc.rust-lang.org/core/iter/trait.Iterator.html#method.filter_map
137+
[then]: https://doc.rust-lang.org/core/primitive.bool.html#method.then
138+
[chars]: https://doc.rust-lang.org/core/primitive.str.html#method.chars
139+
[char]: https://doc.rust-lang.org/std/primitive.char.html
140+
[is-alphabetic]: https://doc.rust-lang.org/core/primitive.char.html#method.is_alphabetic
141+
[char-to-lowercase]: https://doc.rust-lang.org/core/primitive.char.html#method.to_lowercase
142+
[str-to-lowercase]: https://doc.rust-lang.org/std/primitive.str.html#method.to_lowercase
143+
[unicode]: https://en.wikipedia.org/wiki/Unicode
144+
[unicode-segmentation]: https://crates.io/crates/unicode-segmentation
145+
[char]: https://doc.rust-lang.org/std/primitive.char.html
146+
[grapheme-clusters]: https://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
pub fn check(candidate: &str) -> bool {
2+
let mut hs = std::collections::HashSet::new();
3+
candidate
4+
.bytes()
5+
.filter(|c| c.is_ascii_alphabetic())
6+
.map(|c| c.to_ascii_lowercase())
7+
.all(|c| hs.insert(c))
8+
}

0 commit comments

Comments
 (0)