Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 15, 2025

During rebuild_index, Fjall partitions are deleted then immediately recreated. The open_partition call fails with PartitionDeleted because Fjall retains stale internal references to the deleted partition.

Changes

  • Added open_partition_with_retry(): Handles PartitionDeleted errors by retrying with a 50ms delay for file system cleanup
  • Updated open_map(): Delegates to retry helper instead of direct open_partition call
  • Updated remove_map(): Added debug logging for already-deleted partitions

Implementation

fn open_partition_with_retry(...) -> NitriteResult<PartitionHandle> {
    match ks.open_partition(name, config_clone.clone()) {
        Ok(partition) => Ok(partition),
        Err(err) if is_partition_deleted_error(&err.to_string()) => {
            // Retry up to 2 more times with 50ms delay for FS cleanup
            // Fjall's open_partition creates partition if it doesn't exist
        }
        Err(err) => Err(to_nitrite_error(err))
    }
}

The retry mechanism accounts for the timing window where Fjall's partition deletion has completed but internal cleanup is still in progress. Fjall's open_partition has create-if-not-exists semantics, so subsequent calls create a fresh partition.

Original prompt

Root Cause Analysis

The current fix in PR #1 addresses the wrong part of the code flow. The actual issue is:

  1. During rebuild_index, the indexer calls drop_index() which deletes the Fjall partition and removes it from the registry
  2. Immediately after, write_index_entry() is called to rebuild the index
  3. write_index_entry() tries to find the index, doesn't find it (because it was just dropped), and calls create_nitrite_index()
  4. create_nitrite_index() creates a new SimpleIndex or CompoundIndex, which tries to open the map from the store
  5. The Fjall store's open_map() method tries to open a partition that was just deleted
  6. This causes PartitionDeleted error because:
    • The partition was deleted but Fjall's keyspace still has a reference to it
    • When trying to open_partition(), Fjall detects it was deleted and throws the error

The Real Fix Needed

The issue is that Fjall's open_partition() should handle the case where a partition was deleted and needs to be recreated. Currently, when you try to open a deleted partition, it throws PartitionDeleted instead of creating a new one.

We need to modify the Fjall adapter's open_map function to:

  1. Detect when open_partition fails with PartitionDeleted error
  2. In that case, delete the stale partition reference from Fjall's keyspace
  3. Create a fresh partition instead

Implementation

Update nitrite-fjall-adapter/src/store.rs in the open_map function:

fn open_map(&self, name: &str, fjall_store: FjallStore) -> NitriteResult<NitriteMap> {
    let mut closed = false;
    if let Some(map) = self.map_registry.get(name) {
        if map.is_closed()? {
            closed = true;
        } else {
            return Ok(NitriteMap::new(map.clone()));
        }
    }

    // remove the map from registry if it is closed
    if closed {
        self.map_registry.remove(name);
    }

    if let Some(ks) = self.keyspace.get() {
        let config = self.store_config.partition_config();
        
        // Try to open the partition
        let partition_result = ks.open_partition(name, config.clone());
        
        let partition = match partition_result {
            Ok(p) => p,
            Err(err) => {
                let err_msg = err.to_string();
                
                // If partition was deleted, we need to recreate it
                if Self::is_partition_deleted_error(&err_msg) {
                    log::warn!("Partition '{}' was deleted, recreating it", name);
                    
                    // Clean up any stale references in the registry
                    self.map_registry.remove(name);
                    
                    // Try to delete any stale partition handle from keyspace
                    // This is a best-effort operation - if it fails, we proceed anyway
                    let _ = ks.open_partition(name, config.clone())
                        .and_then(|p| ks.delete_partition(p));
                    
                    // Now create a fresh partition
                    ks.open_partition(name, config.clone())
                        .map_err(|e| {
                            log::error!("Failed to recreate partition '{}': {}", name, e);
                            to_nitrite_error(e)
                        })?
                } else {
                    log::error!("Failed to open partition '{}': {}", name, err);
                    return Err(to_nitrite_error(err));
                }
            }
        };

        let fjall_map = FjallMap::new(
            name.to_string(),
            partition,
            fjall_store,
            self.store_config.clone(),
        );
        fjall_map.initialize()?;

        self.map_registry
            .insert(name.to_string(), fjall_map.clone());
        Ok(NitriteMap::new(fjall_map))
    } else {
        Err(NitriteError::new(
            "Keyspace is not initialized",
            ErrorKind::PluginError,
        ))
    }
}

Additional Improvements

Also ensure that remove_map properly handles already-deleted partitions without propagating errors:

fn remove_map(&self, name: &str) -> NitriteResult<()> {
    // close the map if it is open to drop any partition handles held by the map
    self.close_map(name)?;
    
    if let Some(ks) = self.keyspace.get() {
        let options = self.store_config.partition_config();
        match ks.open_partition(name, options) {
            Ok(partition) => {
                match ks.delete_partition(partition.clone()) {
                    Ok(_) => {
                        self.map_registry.remove(name);
                        Ok(())
                    }
                    Err(err) => {
                        log::error!("Failed to remove partition: {}", err);
                        Err(to_nitrite_error(err))
                    }
                }
            }
            Err(err) => {
                let err_msg = err.to_string();
     ...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

*This pull request was created as a result of the following prompt from Copilot chat.*
> ## Root Cause Analysis
> 
> The current fix in PR #1 addresses the wrong part of the code flow. The actual issue is:
> 
> 1. During `rebuild_index`, the indexer calls `drop_index()` which deletes the Fjall partition and removes it from the registry
> 2. Immediately after, `write_index_entry()` is called to rebuild the index
> 3. `write_index_entry()` tries to find the index, doesn't find it (because it was just dropped), and calls `create_nitrite_index()`
> 4. `create_nitrite_index()` creates a new `SimpleIndex` or `CompoundIndex`, which tries to **open** the map from the store
> 5. The Fjall store's `open_map()` method tries to open a partition that was just deleted
> 6. This causes `PartitionDeleted` error because:
>    - The partition was deleted but Fjall's keyspace still has a reference to it
>    - When trying to `open_partition()`, Fjall detects it was deleted and throws the error
> 
> ## The Real Fix Needed
> 
> The issue is that Fjall's `open_partition()` should handle the case where a partition was deleted and needs to be **recreated**. Currently, when you try to open a deleted partition, it throws `PartitionDeleted` instead of creating a new one.
> 
> We need to modify the Fjall adapter's `open_map` function to:
> 
> 1. Detect when `open_partition` fails with `PartitionDeleted` error
> 2. In that case, **delete the stale partition reference** from Fjall's keyspace
> 3. **Create a fresh partition** instead
> 
> ## Implementation
> 
> Update `nitrite-fjall-adapter/src/store.rs` in the `open_map` function:
> 
> ```rust
> fn open_map(&self, name: &str, fjall_store: FjallStore) -> NitriteResult<NitriteMap> {
>     let mut closed = false;
>     if let Some(map) = self.map_registry.get(name) {
>         if map.is_closed()? {
>             closed = true;
>         } else {
>             return Ok(NitriteMap::new(map.clone()));
>         }
>     }
> 
>     // remove the map from registry if it is closed
>     if closed {
>         self.map_registry.remove(name);
>     }
> 
>     if let Some(ks) = self.keyspace.get() {
>         let config = self.store_config.partition_config();
>         
>         // Try to open the partition
>         let partition_result = ks.open_partition(name, config.clone());
>         
>         let partition = match partition_result {
>             Ok(p) => p,
>             Err(err) => {
>                 let err_msg = err.to_string();
>                 
>                 // If partition was deleted, we need to recreate it
>                 if Self::is_partition_deleted_error(&err_msg) {
>                     log::warn!("Partition '{}' was deleted, recreating it", name);
>                     
>                     // Clean up any stale references in the registry
>                     self.map_registry.remove(name);
>                     
>                     // Try to delete any stale partition handle from keyspace
>                     // This is a best-effort operation - if it fails, we proceed anyway
>                     let _ = ks.open_partition(name, config.clone())
>                         .and_then(|p| ks.delete_partition(p));
>                     
>                     // Now create a fresh partition
>                     ks.open_partition(name, config.clone())
>                         .map_err(|e| {
>                             log::error!("Failed to recreate partition '{}': {}", name, e);
>                             to_nitrite_error(e)
>                         })?
>                 } else {
>                     log::error!("Failed to open partition '{}': {}", name, err);
>                     return Err(to_nitrite_error(err));
>                 }
>             }
>         };
> 
>         let fjall_map = FjallMap::new(
>             name.to_string(),
>             partition,
>             fjall_store,
>             self.store_config.clone(),
>         );
>         fjall_map.initialize()?;
> 
>         self.map_registry
>             .insert(name.to_string(), fjall_map.clone());
>         Ok(NitriteMap::new(fjall_map))
>     } else {
>         Err(NitriteError::new(
>             "Keyspace is not initialized",
>             ErrorKind::PluginError,
>         ))
>     }
> }
> ```
> 
> ## Additional Improvements
> 
> Also ensure that `remove_map` properly handles already-deleted partitions without propagating errors:
> 
> ```rust
> fn remove_map(&self, name: &str) -> NitriteResult<()> {
>     // close the map if it is open to drop any partition handles held by the map
>     self.close_map(name)?;
>     
>     if let Some(ks) = self.keyspace.get() {
>         let options = self.store_config.partition_config();
>         match ks.open_partition(name, options) {
>             Ok(partition) => {
>                 match ks.delete_partition(partition.clone()) {
>                     Ok(_) => {
>                         self.map_registry.remove(name);
>                         Ok(())
>                     }
>                     Err(err) => {
>                         log::error!("Failed to remove partition: {}", err);
>                         Err(to_nitrite_error(err))
>                     }
>                 }
>             }
>             Err(err) => {
>                 let err_msg = err.to_string();
>                 
>                 // If partition doesn't exist or was already deleted, that's OK
>                 if Self::is_partition_deleted_error(&err_msg) {
>                     self.map_registry.remove(name);
>                     log::debug!("Partition '{}' was already deleted", name);
>                     Ok(())
>                 } else {
>                     log::error!("Failed to open partition for removal: {}", err);
>                     Err(to_nitrite_error(err))
>                 }
>             }
>         }
>     } else {
>         Err(NitriteError::new(
>             "Keyspace is not initialized",
>             ErrorKind::PluginError,
>         ))
>     }
> }
> ```
> 
> ## Testing
> 
> This fix ensures that:
> 1. When `rebuild_index` drops an index partition and immediately tries to recreate it, the Fjall adapter detects the deleted state
> 2. Instead of failing with `PartitionDeleted`, it recreates the partition fresh
> 3. The index rebuild completes successfully
> 4. All three failing tests pass:
>    - `repository::repository_modification_test::test_rebuild_index`
>    - `collection::single_field_index_test::test_rebuild_index`
>    - `collection::compound_index_test::test_rebuild_index`
> 
> The key insight is that Fjall's `open_partition` should be treated as "open or create" semantics when dealing with deleted partitions, similar to how databases typically handle this scenario.
> 

<!-- START COPILOT CODING AGENT TIPS -->
---

💬 We'd love your input! Share your thoughts on Copilot coding agent in our [2 minute survey](https://gh.io/copilot-coding-agent-survey).

@coderabbitai
Copy link

coderabbitai bot commented Dec 15, 2025

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Note

Free review on us!

CodeRabbit is offering free reviews until Wed Dec 17 2025 to showcase some of the refinements we've made.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copilot AI and others added 3 commits December 15, 2025 08:16
… partitions

Co-authored-by: anidotnet <696662+anidotnet@users.noreply.github.com>
… issues

Co-authored-by: anidotnet <696662+anidotnet@users.noreply.github.com>
…mentation

Co-authored-by: anidotnet <696662+anidotnet@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix open partition issue in index rebuilding process Fix PartitionDeleted error during index rebuild by handling stale partition references Dec 15, 2025
Copilot AI requested a review from anidotnet December 15, 2025 08:32
@anidotnet anidotnet marked this pull request as ready for review December 15, 2025 08:34
Copilot AI review requested due to automatic review settings December 15, 2025 08:34
@anidotnet anidotnet merged commit 0456f69 into copilot/fix-rebuild-index-tests Dec 15, 2025
4 checks passed
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a PartitionDeleted error that occurs during index rebuild operations when Fjall partitions are deleted and immediately recreated, causing stale internal references.

Key Changes:

  • Added open_partition_with_retry() helper function with retry logic and delays for handling deleted partition scenarios
  • Refactored open_map() to delegate to the retry helper instead of directly calling open_partition
  • Enhanced remove_map() error handling with debug logging for already-deleted partitions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +293 to +306
// If the retry also fails, wait a moment for file system cleanup
log::debug!("First retry failed, waiting briefly for cleanup: {}", retry_err);

// Block briefly to allow Fjall's file system cleanup to complete
std::thread::sleep(std::time::Duration::from_millis(
Self::PARTITION_CLEANUP_DELAY_MS
));

// Final attempt
ks.open_partition(name, config_clone)
.map_err(|e| {
log::error!("Failed to recreate partition '{}' after retries: {}", name, e);
to_nitrite_error(e)
})
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The retry logic unconditionally retries after sleep regardless of the error type. If the second attempt (line 290) fails for a reason other than PartitionDeleted (e.g., permission error, disk full), the code still sleeps and retries. Consider checking if retry_err is also a PartitionDeleted error before sleeping, or at minimum logging the error type to help diagnose non-transient failures.

Suggested change
// If the retry also fails, wait a moment for file system cleanup
log::debug!("First retry failed, waiting briefly for cleanup: {}", retry_err);
// Block briefly to allow Fjall's file system cleanup to complete
std::thread::sleep(std::time::Duration::from_millis(
Self::PARTITION_CLEANUP_DELAY_MS
));
// Final attempt
ks.open_partition(name, config_clone)
.map_err(|e| {
log::error!("Failed to recreate partition '{}' after retries: {}", name, e);
to_nitrite_error(e)
})
let retry_err_msg = retry_err.to_string();
if Self::is_partition_deleted_error(&retry_err_msg) {
log::debug!("First retry failed with PartitionDeleted, waiting briefly for cleanup: {}", retry_err);
// Block briefly to allow Fjall's file system cleanup to complete
std::thread::sleep(std::time::Duration::from_millis(
Self::PARTITION_CLEANUP_DELAY_MS
));
// Final attempt
ks.open_partition(name, config_clone)
.map_err(|e| {
log::error!("Failed to recreate partition '{}' after retries: {}", name, e);
to_nitrite_error(e)
})
} else {
log::error!(
"Failed to recreate partition '{}' on retry: {} (error type: {})",
name, retry_err, retry_err_msg
);
Err(to_nitrite_error(retry_err))
}

Copilot uses AI. Check for mistakes.
Comment on lines +259 to +263
///
/// This handles the case where a partition was deleted (e.g., during index rebuild)
/// and needs to be recreated. When Fjall reports a partition is deleted, we simply
/// try to open it again, which will create a new partition.
///
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation states "try to open it again" but the implementation actually tries to open it twice more (line 290 and line 302), for a total of 3 attempts. The documentation should clarify that up to 3 total attempts are made: initial attempt, immediate retry, and retry after delay.

Suggested change
///
/// This handles the case where a partition was deleted (e.g., during index rebuild)
/// and needs to be recreated. When Fjall reports a partition is deleted, we simply
/// try to open it again, which will create a new partition.
///
///
/// This handles the case where a partition was deleted (e.g., during index rebuild)
/// and needs to be recreated. When Fjall reports a partition is deleted, this method
/// will attempt to open the partition up to three times: the initial attempt, an immediate
/// retry if the partition was deleted, and a final retry after a brief delay if the second
/// attempt also fails. This ensures that transient file system cleanup issues are handled
/// gracefully and a new partition is created if needed.
///

Copilot uses AI. Check for mistakes.
Comment on lines +293 to +306
// If the retry also fails, wait a moment for file system cleanup
log::debug!("First retry failed, waiting briefly for cleanup: {}", retry_err);

// Block briefly to allow Fjall's file system cleanup to complete
std::thread::sleep(std::time::Duration::from_millis(
Self::PARTITION_CLEANUP_DELAY_MS
));

// Final attempt
ks.open_partition(name, config_clone)
.map_err(|e| {
log::error!("Failed to recreate partition '{}' after retries: {}", name, e);
to_nitrite_error(e)
})
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The immediate retry attempt (line 290) doesn't check if the error is a PartitionDeleted error before proceeding to the delayed retry. This means even non-transient errors (e.g., permission denied, invalid configuration) will trigger the sleep-and-retry path. Consider checking if retry_err indicates a transient PartitionDeleted error before proceeding to the delayed retry, otherwise fail fast for unrelated errors.

Suggested change
// If the retry also fails, wait a moment for file system cleanup
log::debug!("First retry failed, waiting briefly for cleanup: {}", retry_err);
// Block briefly to allow Fjall's file system cleanup to complete
std::thread::sleep(std::time::Duration::from_millis(
Self::PARTITION_CLEANUP_DELAY_MS
));
// Final attempt
ks.open_partition(name, config_clone)
.map_err(|e| {
log::error!("Failed to recreate partition '{}' after retries: {}", name, e);
to_nitrite_error(e)
})
// Only proceed to delayed retry if this is still a PartitionDeleted error
let retry_err_msg = retry_err.to_string();
if Self::is_partition_deleted_error(&retry_err_msg) {
log::debug!("First retry failed with PartitionDeleted, waiting briefly for cleanup: {}", retry_err);
// Block briefly to allow Fjall's file system cleanup to complete
std::thread::sleep(std::time::Duration::from_millis(
Self::PARTITION_CLEANUP_DELAY_MS
));
// Final attempt
ks.open_partition(name, config_clone)
.map_err(|e| {
log::error!("Failed to recreate partition '{}' after retries: {}", name, e);
to_nitrite_error(e)
})
} else {
log::error!("Failed to recreate partition '{}' on retry: {}", name, retry_err);
Err(to_nitrite_error(retry_err))
}

Copilot uses AI. Check for mistakes.
Comment on lines +273 to +276
// Clone config once to avoid multiple clones in retry paths
let config_clone = config.clone();

match ks.open_partition(name, config_clone.clone()) {
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config parameter is cloned unnecessarily. Since config is already a reference, you can use it directly in the open_partition calls. The intermediate config_clone variable and its subsequent clones add overhead without providing any benefit. Consider removing line 274 and using config.clone() directly in the open_partition calls.

Copilot uses AI. Check for mistakes.
@anidotnet anidotnet deleted the copilot/fix-open-partition-issue branch December 16, 2025 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants