|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: "Closed-form Bit Magic for OEIS A290255" |
| 4 | +date: 2024-01-02 |
| 5 | +no_toc: true |
| 6 | +--- |
| 7 | + |
| 8 | +Ah, the [On-Line Encyclopedia of Integer Sequences](https://oeis.org/). |
| 9 | + |
| 10 | +{:style="width: 100%;"} |
| 11 | + |
| 12 | +The next best thing to magic. |
| 13 | + |
| 14 | +Not only can you usually pick up a nice closed-form expression for whatever numbers you paste in, but more often than not you'll get a peek into your numbers' natural habitat in some strange (and often frightening) realm of mathematics. |
| 15 | +Careful! |
| 16 | +Don't think too hard about why your problem's secret isomorphism to [gray-coded](https://en.wikipedia.org/wiki/Gray_code) [Hadamard matrices](https://en.wikipedia.org/wiki/Hadamard_matrix) lest your mind become boggled. |
| 17 | +Just close that door back up, and you can go about coding in your new [top-tier arcana](https://en.wikipedia.org/wiki/Fast_inverse_square_root). |
| 18 | + |
| 19 | +For those unfamiliar, OEIS is a database of integer sequences. |
| 20 | +You plug in a few terms and get back a list of possible generating sequences with references to the literature and links to related sequences. |
| 21 | +(The New York Times did a nice feature on OEIS a few months back that's [well worth a read](https://www.nytimes.com/2023/05/21/science/math-puzzles-integer-sequences.html).) |
| 22 | + |
| 23 | +Using OEIS today to solve some problems related to [hereditary stratigraphy](https://mmore500.com/0001/01/01/hstrat.html), I crossed paths with [A290255](https://oeis.org/A290255). |
| 24 | + |
| 25 | +```python3 |
| 26 | +A290255 = [0,1,0,2,1,0,0,3,2,1,1,0,0,0,0,4,3,2,2,1,1,1,1,0, |
| 27 | + 0,0,0,0,0,0,0,5,4,3,3,2,2,2,2,1,1,1,1,1,1,1,1,0,0, |
| 28 | + 0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,5,4,4,3,3,3,3,2,2,2, |
| 29 | + 2,2,2,2,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0, |
| 30 | + 0,0,0,0] |
| 31 | +``` |
| 32 | + |
| 33 | +This sequence ended up being what I needed, but OEIS listed no closed-form expression for the sequence. |
| 34 | + |
| 35 | +The description --- though --- gave enough *aha!* to light the way. |
| 36 | +Turns out, this sequence is just the "number of 0's following directly the first 1 in the binary representation of n." |
| 37 | + |
| 38 | +## Implementation |
| 39 | + |
| 40 | +I needed a closed-form expression for my use case, and here's what I came up with. |
| 41 | + |
| 42 | +```python3 |
| 43 | +def bit_floor(n: int) -> int: |
| 44 | + """Calculate the largest power of two not greater than n. |
| 45 | +
|
| 46 | + If zero, returns zero. |
| 47 | + """ |
| 48 | + mask = 1 << (n >> 1).bit_length() |
| 49 | + return n & mask |
| 50 | + |
| 51 | + |
| 52 | +def bit_drop_msb(n: int) -> int: |
| 53 | + """Drop most significant bit from binary representation of integer n.""" |
| 54 | + return n & (~bit_floor(n)) |
| 55 | + |
| 56 | + |
| 57 | +def bit_count_immediate_zeros(x: int) -> int: |
| 58 | + """Count the number of zeros immediately following the first binary digit |
| 59 | + of x.""" |
| 60 | + # https://oeis.org/A290255 |
| 61 | + return x.bit_length() - bit_drop_msb(x).bit_length() - bool(x) |
| 62 | +``` |
| 63 | + |
| 64 | +We can smoosh it all together to get a sufficiently ghastly one-liner. |
| 65 | + |
| 66 | +```python3 |
| 67 | + |
| 68 | +This all comes together as |
| 69 | +```python3 |
| 70 | +def bit_count_immediate_zeros(n: int) -> int: |
| 71 | + return n.bit_length() - (n & (~(1 << (n >> 1).bit_length()))).bit_length() - bool(n) |
| 72 | + |
| 73 | +``` |
| 74 | + |
| 75 | +Note that if you're using *numpy* and might have *numpy* data types floating around, you'll want to explicitly cast inputs to `int` to get access to Python's built-in bit manipulation methods. |
| 76 | + |
| 77 | +## How Does it Work? |
| 78 | + |
| 79 | +How does this work? |
| 80 | +The trick is dropping the most significant bit. |
| 81 | +This will expose the immediate 0's (i.e., directly following the leading 1), if any, and they'll collapse away in Python's integer representation. |
| 82 | +Afterwards, the next following 1 will be left most significant bit if present. |
| 83 | +We can test how many bits were stripped away by subtracting the new bit length from the old bit length. |
| 84 | + |
| 85 | +Note that we need to account for the leading 1 in the bit length, which is taken care of by the last term. |
| 86 | +We'd like to subtract one from the bit length of the original number, but only if there was a bit there in the first place. |
| 87 | +The `bool(x)` tacked on the end does this for us --- it will evaluate as 1 if `x` is positive and 0 if `x` is zero. |
| 88 | + |
| 89 | +What about `bit_drop_msb` and `bit_floor`? |
| 90 | + |
| 91 | +In `bit_drop_msb`, the call to `bit_floor` isolates the leading one in the binary representation of `n`. |
| 92 | +This leading bit lets us easily generate an inverse mask --- i.e., with all other non-leading bits set to 1. |
| 93 | +Masking the original `n` and keeping only masked bits leaves everything else in place except for the most significant bit. |
| 94 | + |
| 95 | +In `bit_floor`, our goal is to find the largest power of two not greater than `n`. |
| 96 | +This amounts to isolating the most significant bit of `n`. |
| 97 | +A simple way to do this would be counting the number of bits in `n` and then shifting 1 up to that power of two. |
| 98 | +Something like `1 << (n.bit_length() - 1)`. |
| 99 | +Note that we have to subtract one because the 1 we are shifting up is already "shifted" into the first (i.e., 0th) position. |
| 100 | + |
| 101 | +The `n >> 1` in `bit_floor` is a trick for zero-handling. |
| 102 | +If `n` is zero, `n >> 1` will simply nop, and `bit_floor` will return zero. |
| 103 | +Otherwise, we can just shift 1 up to the power of two of `n`'s most significant bit. |
| 104 | +We no longer have to subtract one due to the `n >> 1` downshift. |
| 105 | + |
| 106 | +## Validation |
| 107 | + |
| 108 | +Let's check against the reference sequence entries. |
| 109 | + |
| 110 | +```python3 |
| 111 | +>>> import itertools as it |
| 112 | +>>> juxtaposed = zip( |
| 113 | +... map(bit_count_immediate_zeros, it.count(1)), # oeis is 1-indexed |
| 114 | +... A290255, |
| 115 | +... ) |
| 116 | +>>> all(it.starmap(int.__eq__, juxtaposed)) |
| 117 | +True |
| 118 | +``` |
| 119 | + |
| 120 | +Okay, let's do a little more validation by comparing against a string-based implementation. |
| 121 | + |
| 122 | +```python3 |
| 123 | +def bit_count_immediate_zeros_str(n: int) -> int: |
| 124 | + """Count the number of zeros immediately following the first binary digit |
| 125 | + of x.""" |
| 126 | + return f"{n:b}1"[1:].index("1") # fstring "1" ensures a trailing 1 |
| 127 | +``` |
| 128 | + |
| 129 | +We can test our alternate implementation against the reference sequence entries. |
| 130 | + |
| 131 | +```python3 |
| 132 | +>>> juxtaposed = zip( |
| 133 | +... map(bit_count_immediate_zeros_str, it.count(1)), # oeis is 1-indexed |
| 134 | +... A290255, |
| 135 | +... ) |
| 136 | +>>> all(it.starmap(int.__eq__, juxtaposed)) |
| 137 | +True |
| 138 | +``` |
| 139 | + |
| 140 | +And then test against one another. |
| 141 | + |
| 142 | +```python3 |
| 143 | +>>> test_indices = [ |
| 144 | +... 0, 1, 2, |
| 145 | +... *map(int, np.linspace(0, 2**32, 10**5, dtype=int)), |
| 146 | +... *map(int, np.geomspace(1, 2**32, 10**5, dtype=int)), |
| 147 | +... *map(int, np.random.randint(0, 2**32, 10**5)), |
| 148 | +... ] |
| 149 | +>>> juxtaposed = zip( |
| 150 | +... map(bit_count_immediate_zeros, test_indices), |
| 151 | +... map(bit_count_immediate_zeros_str, test_indices), |
| 152 | +... ) |
| 153 | +>>> all(it.starmap(int.__eq__, juxtaposed)) |
| 154 | +True |
| 155 | +``` |
| 156 | + |
| 157 | +Yup, everything checks out. |
| 158 | +Both implementations agree over integers sampled up to `2**32`! |
0 commit comments