Regarding the implementation of the remove() method #3
Replies: 2 comments 2 replies
-
We have a few options that came to my mind, but I believe they are all already popularly known. 1. The Canonical Approach: memmove (Shifting Elements) 2. High-Performance Alternative for Removals: swap_remove 3. Approaches Using Auxiliary Data Structures Using a FixedBitSet (or BitVec) How It Works: remove(index): Removing becomes a very fast operation. You simply access the FixedBitSet and set the bit at position index to false. Iteration: When iterating over the CustomVec, you first need to query the FixedBitSet. If the bit at i is true, you process the element; otherwise, you ignore it. Logical Index Access: The concept of "index" changes. The "third valid element" might not be at index 2. To find the kth element, you would need to iterate through the BitSet to find the kth position marked as true. Compaction: Periodically, you can run a compact() function that creates a new array containing only the valid elements, freeing the "holes" and resetting the BitSet.
Personally I was thinking about implementing both variants, |
Beta Was this translation helpful? Give feedback.
-
Plz verify the fn remove: I apologize for the delay, it's because at that time I'm still working, and I have little time left to dedicate. Detailed Analysis of the
|
Step | Operation | State of Chunks | len |
---|---|---|---|
1 | Initial State | [[1, 2], [3, 4], [5, 6], [7, 8]] |
8 |
2 | ptr::read(data[1][0]) |
The value 3 is read and saved for return. The slot data[1][0] now contains invalid data. |
8 |
3 | Intra-Chunk Shift | ptr::copy moves 4 to position 0 of chunk 1. Chunk 1 becomes [4, 4] . |
8 |
4 | Inter-Chunk Shift (i=1) | The value 5 (from data[2][0] ) is moved to the end of chunk 1 (data[1][1] ). The rest of chunk 2 is shifted. |
8 |
Partial Result | [[1, 2], [4, 5], [6, 6], [7, 8]] |
8 | |
5 | Inter-Chunk Shift (i=2) | The value 7 (from data[3][0] ) is moved to the end of chunk 2 (data[2][1] ). The rest of chunk 3 is shifted. |
8 |
Partial Result | [[1, 2], [4, 5], [6, 7], [8, 8]] |
8 | |
6 | Cleanup (MaybeUninit ) |
The if index < self.len - 1 (2 < 7) is true. The last slot (data[3][1] ) contains a duplicate 8 and must be invalidated. |
8 |
Result | [[1, 2], [4, 5], [6, 7], [8, <uninit>]] |
8 | |
7 | Finalization | self.len is decremented, and the function returns the value 3 . |
7 |
8 | Final State | [[1, 2], [4, 5], [6, 7], [8, <uninit>]] |
7 |
This table shows why the ... = MaybeUninit::uninit()
line is crucial. Without it, we would have two copies of the value 8
, which would cause a double drop if T
were a type like String
.
Example 2: Removing the Last Element (Edge Case)
Now, for the case where the removal happens at the end of the vector.
- Initial State:
len = 8
,N = 2
. - Data:
[[1, 2], [3, 4], [5, 6], [7, 8]]
- Operation:
remove(index: 7)
(removing the value8
).
Step | Operation | State of Chunks | len |
---|---|---|---|
1 | Initial State | [[1, 2], [3, 4], [5, 6], [7, 8]] |
8 |
2 | ptr::read(data[3][1]) |
The value 8 is read and saved for return. |
8 |
3 | Intra-Chunk Shift | count is 0. The ptr::copy is skipped. No shifting occurs. |
8 |
4 | Inter-Chunk Shift | The for loop does not execute, as current_chunk_idx (3 ) is not less than until_chunk_idx (3 ). |
8 |
5 | Cleanup (MaybeUninit ) |
The condition if index < self.len - 1 (7 < 7) is false. The cleanup line is correctly skipped. |
8 |
6 | Finalization | self.len is decremented, and the function returns the value 8 . |
7 |
7 | Final State | [[1, 2], [3, 4], [5, 6], [7, 8]] (the last slot data[3][1] is logically inaccessible). |
7 |
In this case, the subsequent truncate
logic may remove the last chunk if it becomes entirely empty after the length is decreased. The absence of shifting makes the cleanup unnecessary, and the implemented if
condition handles this perfectly.
I hope this analysis helps!
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi @incapdns !
I'd like to continue our discussion about the
remove()
implementation here in the discussions section for better visibility and future reference.For context, this stems from our conversation in PR #1 where you raised an excellent point about the potential memory safety issues with implementing
remove()
in a way that would create "holes" in our chunked structure.Thank you for the thoughtful feedback! It's great to have someone with such insight reviewing the code. I'm also quite new to Rust, so it's incredibly encouraging to hear that you found the crate interesting.
You've raised an excellent point about the potential issues with implementing
remove()
in a way that would create "holes" in our chunked structure. Let me share my thoughts on this:The Cost of Tracking Initialization State
Maintaining an
initialized
bitmap or similar structure would indeed introduce significant overhead. We'd need to:This seems like it would add substantial memory and computational overhead for what should be a relatively simple data structure.
The "Holes" Problem
If we allowed
remove()
to create holes in our chunks, it would complicate virtually every other operation:vec[index]
when there might be uninitialized slots before that index?These complications would essentially change the fundamental nature of our data structure.
Learning from std::Vec
Looking at how
std::Vec
handles this, here's roughly howVec::remove()
works:The key insight is that
Vec
maintains contiguity by shifting elements to fill the gap. While this is O(n) forVec
, it would be somewhat more expensive forChunkedVec
since we'd need to potentially shift elements across chunk boundaries. However, I think this trade-off is worth it to maintain the simplicity and predictability of our data structure.Alternative: swap_remove()
For cases where removal order doesn't matter and performance is critical, we could also implement
swap_remove()
:This would be O(1) and maintain contiguity, similar to
Vec::swap_remove()
.Moving Forward
I'm completely open to other approaches and would love to hear your thoughts on this direction. The shift-based
remove()
seems like it would:Drop
implementationWhat do you think about this approach? Are there other considerations I'm missing?
Thanks again for the excellent feedback – discussions like this are exactly what make open source development so valuable!
Beta Was this translation helpful? Give feedback.
All reactions