Regarding the implementation of the remove() method #3

QuarkPixel · 2025-08-13T03:02:32Z

QuarkPixel
Aug 13, 2025
Maintainer

I'd like to continue our discussion about the remove() implementation here in the discussions section for better visibility and future reference.

For context, this stems from our conversation in PR #1 where you raised an excellent point about the potential memory safety issues with implementing remove() in a way that would create "holes" in our chunked structure.

Thank you for the thoughtful feedback! It's great to have someone with such insight reviewing the code. I'm also quite new to Rust, so it's incredibly encouraging to hear that you found the crate interesting.

You've raised an excellent point about the potential issues with implementing remove() in a way that would create "holes" in our chunked structure. Let me share my thoughts on this:

The Cost of Tracking Initialization State

Maintaining an initialized bitmap or similar structure would indeed introduce significant overhead. We'd need to:

Track which specific slots in each chunk are initialized
Update this metadata on every insertion/removal
Check initialization state during access operations
Handle the complexity of partial chunk states

This seems like it would add substantial memory and computational overhead for what should be a relatively simple data structure.

The "Holes" Problem

If we allowed remove() to create holes in our chunks, it would complicate virtually every other operation:

Indexing: How do we handle vec[index] when there might be uninitialized slots before that index?
Iteration: We'd need to skip over holes, making iteration much more complex
Push operations: Where do we insert new elements? Fill holes first, or always append?
Memory layout: The core appeal of ChunkedVec is its predictable, efficient memory layout

These complications would essentially change the fundamental nature of our data structure.

Learning from std::Vec

Looking at how std::Vec handles this, here's roughly how Vec::remove() works:

pub fn remove(&mut self, index: usize) -> T {
    assert!(index < self.len, "removal index out of bounds");
    
    unsafe {
        // Read the element to be removed
        let ret = ptr::read(self.as_ptr().add(index));
        
        // Shift all subsequent elements one position to the left
        ptr::copy(
            self.as_ptr().add(index + 1),    // source
            self.as_mut_ptr().add(index),    // destination  
            self.len - index - 1             // count
        );
        
        self.len -= 1;
        ret
    }
}

The key insight is that Vec maintains contiguity by shifting elements to fill the gap. While this is O(n) for Vec, it would be somewhat more expensive for ChunkedVec since we'd need to potentially shift elements across chunk boundaries. However, I think this trade-off is worth it to maintain the simplicity and predictability of our data structure.

Alternative: swap_remove()

For cases where removal order doesn't matter and performance is critical, we could also implement swap_remove():

pub fn swap_remove(&mut self, index: usize) -> T {
    assert!(index < self.len);
    
    unsafe {
        let removed = ptr::read(/* get element at index */);
        
        // Move the last element to fill the gap
        if index != self.len - 1 {
            let last = ptr::read(/* get last element */);
            ptr::write(/* write to index position */, last);
        }
        
        self.len -= 1;
        removed
    }
}

This would be O(1) and maintain contiguity, similar to Vec::swap_remove().

Moving Forward

I'm completely open to other approaches and would love to hear your thoughts on this direction. The shift-based remove() seems like it would:

Maintain the current safety of our Drop implementation
Keep the data structure's behavior predictable and simple
Avoid the complexity of tracking partial initialization

What do you think about this approach? Are there other considerations I'm missing?

Thanks again for the excellent feedback – discussions like this are exactly what make open source development so valuable!

incapdns · 2025-08-13T04:07:06Z

incapdns
Aug 13, 2025
Collaborator

We have a few options that came to my mind, but I believe they are all already popularly known.

1. The Canonical Approach: memmove (Shifting Elements)
The standard and most intuitive way to remove an element at index i and maintain order is to copy all elements from index i+1 one position to the left. This is the approach used by the default implementation of Rust's Vec::remove.

2. High-Performance Alternative for Removals: swap_remove
Although this approach doesn't maintain order, it's essential to know about for its performance gains. The swap_remove function (also present in Rust's Vec) removes an element by swapping it with the last element of the vector and then removing the tail.

3. Approaches Using Auxiliary Data Structures
When the O(n) cost of removal is prohibitive, auxiliary data structures can be used. The general idea is to mark an item as "removed" without shifting the others, and to handle these "holes" during iteration or in a subsequent compaction operation.

Using a FixedBitSet (or BitVec)
A FixedBitSet is a fixed-size bitmap. It can be used to track which "slots" in your CustomVec are valid and which have been removed.

How It Works:
Structure: Your CustomVec would contain the data vector and a FixedBitSet of the same size. The bit at position i indicates whether the element in data[i] is valid (true) or has been removed (false).

remove(index): Removing becomes a very fast operation. You simply access the FixedBitSet and set the bit at position index to false.

Iteration: When iterating over the CustomVec, you first need to query the FixedBitSet. If the bit at i is true, you process the element; otherwise, you ignore it.

Logical Index Access: The concept of "index" changes. The "third valid element" might not be at index 2. To find the kth element, you would need to iterate through the BitSet to find the kth position marked as true.

Compaction: Periodically, you can run a compact() function that creates a new array containing only the valid elements, freeing the "holes" and resetting the BitSet.

Approach	remove()	Index Access	Space Usage	Order-Preserving?	Ideal For
Shift (Default Vec::remove)	O(n)	O(1)	O(1)	Yes	General use, especially with few removals or removals at the end of the vector.
swap_remove	O(1)	O(1)	O(1)	No	Cases where order doesn't matter and removal performance is critical.
Auxiliary FixedBitSet	O(1)	O(n) (for kth)	O(n)	Yes	Many batch removals, where the cost can be amortized during iteration or compaction.

Personally I was thinking about implementing both variants, fn swap_remove and fn remove with the traditional model of shifting elements

1 reply

QuarkPixel Aug 13, 2025
Maintainer Author

Perfect! Your analysis aligns exactly with my thinking. That comparison table is particularly helpful.

I completely agree that implementing both remove() and swap_remove() is the right path forward. This approach also ensures our current Drop implementation remains safe since we'll maintain element contiguity.

Since you mentioned you intended to implement fn remove, would you like to go ahead with that implementation? I'd be happy to review and collaborate on it.

Thanks for the thorough analysis!

incapdns · 2025-08-13T20:29:48Z

incapdns
Aug 13, 2025
Collaborator

Plz verify the fn remove:

#4

I apologize for the delay, it's because at that time I'm still working, and I have little time left to dedicate.

@QuarkPixel

Detailed Analysis of the `remove` Function

Hello! This is a very efficient implementation for removal in a chunked vector. To aid in the review and future understanding of the unsafe logic, I have prepared a detailed explanation of how the operation works, with practical examples.

The remove function aims to remove an element at a specific index and shift all subsequent elements to the left, maintaining the integrity of the chunked structure.

The logic can be broken down into 3 main stages:

Removal and Shift in the Initial Chunk: The element is read, and the remaining items within the same chunk are moved to the left.
Shift Between Chunks: A loop moves the first element of each subsequent chunk to the end of the previous chunk, keeping the structure compact.
Cleanup and Finalization: The duplicate value left over in the last position after all shifts is invalidated to prevent a double drop, and len is updated.

Example 1: Removing an Element from the Middle

Let's analyze the most complete case: removing an element that forces shifting across multiple chunks.

Initial State: len = 8, N = 2.
Data: [[1, 2], [3, 4], [5, 6], [7, 8]]
Operation: remove(index: 2) (removing the value 3).

Step	Operation	State of Chunks	`len`
1	Initial State	`[[1, 2], [3, 4], [5, 6], [7, 8]]`	8
2	`ptr::read(data[1][0])`	The value `3` is read and saved for return. The slot `data[1][0]` now contains invalid data.	8
3	Intra-Chunk Shift	`ptr::copy` moves `4` to position `0` of chunk 1. Chunk 1 becomes `[4, 4]`.	8
4	Inter-Chunk Shift (i=1)	The value `5` (from `data[2][0]`) is moved to the end of chunk 1 (`data[1][1]`). The rest of chunk 2 is shifted.	8
	Partial Result	`[[1, 2], [4, 5], [6, 6], [7, 8]]`	8
5	Inter-Chunk Shift (i=2)	The value `7` (from `data[3][0]`) is moved to the end of chunk 2 (`data[2][1]`). The rest of chunk 3 is shifted.	8
	Partial Result	`[[1, 2], [4, 5], [6, 7], [8, 8]]`	8
6	Cleanup (`MaybeUninit`)	The `if index < self.len - 1` (2 < 7) is true. The last slot (`data[3][1]`) contains a duplicate `8` and must be invalidated.	8
	Result	`[[1, 2], [4, 5], [6, 7], [8, <uninit>]]`	8
7	Finalization	`self.len` is decremented, and the function returns the value `3`.	7
8	Final State	`[[1, 2], [4, 5], [6, 7], [8, <uninit>]]`	7

This table shows why the ... = MaybeUninit::uninit() line is crucial. Without it, we would have two copies of the value 8, which would cause a double drop if T were a type like String.

Example 2: Removing the Last Element (Edge Case)

Now, for the case where the removal happens at the end of the vector.

Initial State: len = 8, N = 2.
Data: [[1, 2], [3, 4], [5, 6], [7, 8]]
Operation: remove(index: 7) (removing the value 8).

Step	Operation	State of Chunks	`len`
1	Initial State	`[[1, 2], [3, 4], [5, 6], [7, 8]]`	8
2	`ptr::read(data[3][1])`	The value `8` is read and saved for return.	8
3	Intra-Chunk Shift	`count` is 0. The `ptr::copy` is skipped. No shifting occurs.	8
4	Inter-Chunk Shift	The `for` loop does not execute, as `current_chunk_idx` (`3`) is not less than `until_chunk_idx` (`3`).	8
5	Cleanup (`MaybeUninit`)	The condition `if index < self.len - 1` (7 < 7) is false. The cleanup line is correctly skipped.	8
6	Finalization	`self.len` is decremented, and the function returns the value `8`.	7
7	Final State	`[[1, 2], [3, 4], [5, 6], [7, 8]]` (the last slot `data[3][1]` is logically inaccessible).	7

In this case, the subsequent truncate logic may remove the last chunk if it becomes entirely empty after the length is decreased. The absence of shifting makes the cleanup unnecessary, and the implemented if condition handles this perfectly.

I hope this analysis helps!

1 reply

QuarkPixel Aug 14, 2025
Maintainer Author

Thank you for the detailed analysis! Your step-by-step breakdown is very helpful.

I've made some modifications to your implementation based on testing. You can check the changes at: incapdns#2

The main change was removing the cleanup code that sets the last element to MaybeUninit::uninit(). After testing, I found this wasn't necessary and had some edge case issues. Following std::Vec's approach, we only need to guarantee validity within [0, len).

Your original logic was sound, and the examples really helped understand the cross-chunk shifting. No need to apologize for delayed responses - that's completely normal! Thanks for the contribution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Regarding the implementation of the remove() method #3

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Regarding the implementation of the remove() method #3

Uh oh!

QuarkPixel Aug 13, 2025 Maintainer

The Cost of Tracking Initialization State

The "Holes" Problem

Learning from std::Vec

Alternative: swap_remove()

Moving Forward

Replies: 2 comments · 2 replies

Uh oh!

incapdns Aug 13, 2025 Collaborator

Uh oh!

QuarkPixel Aug 13, 2025 Maintainer Author

Uh oh!

Uh oh!

incapdns Aug 13, 2025 Collaborator

Detailed Analysis of the remove Function

Example 1: Removing an Element from the Middle

Example 2: Removing the Last Element (Edge Case)

Uh oh!

QuarkPixel Aug 14, 2025 Maintainer Author

QuarkPixel
Aug 13, 2025
Maintainer

Replies: 2 comments 2 replies

incapdns
Aug 13, 2025
Collaborator

QuarkPixel Aug 13, 2025
Maintainer Author

incapdns
Aug 13, 2025
Collaborator

Detailed Analysis of the `remove` Function

QuarkPixel Aug 14, 2025
Maintainer Author