-
Notifications
You must be signed in to change notification settings - Fork 617
Reduced QuickDecode memory consumption by 128,884x #420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
It looks like your second commit is a "fixup" of the first. Can you squash and rebase your PR? |
|
the second commit was a whole different optimization, the pr is already up to date with the master branch, I've also updated the description of the pr to explain the changes |
If it makes semantically sense to keep them separated, could you at least rework the commit messages to better explain these changes? At the moment, they are just "reduced quickdecode memory usage by 8x" and "tiny table". It is not clear from these, what technically changed in these commits. I know it is described here in the PR, but could you please summarise these changes on a technical level in the commit message, such that it is easier for someone reading the git history to understand what changed and why? |
7995110 to
244866d
Compare
|
all i did was rename the commit, i don't understand why this test failed |
Allows them to be stored in CPU registers and avoids memcopy-ing the struct (cherry picked from commit e504236)
…into opti-quickdecode
|
i think i messed up a bit, im not really familiar with rebase, should i just open up a new branch and pr with just this optimization? @christian-rauch |
Pigeonhole-principle based quick decode table
Problem Statement
The previous AprilTag decoding implementation relied on exhaustive precomputation of error permutations to achieve O(1) lookups. While effective for small tag families (e.g., 16h5), this approach is computationally and spatially prohibitive for larger families such as 52h13.
For the 52h13 family, precomputing all variations with up to 2-bit errors results in approximately 67 million hash map entries. which amounts to approximately 3 GB of RAM being used for this simple check. If we were to extend this to 3-bit errors, the amount of space needed would increase exponentially to approximately 54 GB, making it essentially unusable. even for smaller tag families, dedicating this many resources to a single check is not possible in many small robotics applications
Proposed Solution
This PR replaces the combinatorial precomputation strategy with a search-based algorithm utilizing the Pigeonhole Principle.
Instead of storing every possible error permutation, the new implementation indexes the valid tags by splitting the code into 4 discrete chunks (for eg 13 bits each for 52h13). Given a maximum tolerance of 3-bit errors, the Pigeonhole Principle guarantees that at least one of the four chunks in an observed tag must match the valid tag perfectly.
The decoding process is updated to:
Memory Complexity
The legacy implementation required storing every error permutation ($1 + N + \binom{N}{2} + \binom{N}{3}$ ) multiplied by a load factor of 3 to resolve collisions. For the 52h13 family, this necessitated allocating over 69000 slots per tag (approx. 3 billion total slots). The new implementation eliminates this combinatorial explosion, storing exactly 4 references per tag. 52h13 Memory Footprint went from ~54 GB to ~450 KB which is nearly a 128,884x Reduction
In summary -