feat(cli): database pruning tool for historical trie data by devin-ai-integration[bot] · Pull Request #123 · dojoengine/katana

devin-ai-integration · 2025-06-13T07:34:26Z

Database Pruning Tool for Historical Trie Data

Overview

This PR implements a database pruning tool for the katana db utility command to address the exponential growth of historical trie data in the Katana database. The tool provides two pruning modes to help manage database size while preserving data integrity.

Features

Prune All History: Removes all historical trie data, keeping only the latest trie state
Keep Last N Blocks: Removes historical data older than the specified number of blocks
Safety Checks: Validates inputs and provides warnings to prevent accidental data loss
Progress Monitoring: Real-time progress bars showing pruning status with time estimates
Efficient Operations:
- Uses database cursors for bulk deletions
- Batch processing (1000 entries at a time) to minimize memory usage
- Transaction-based operations with proper error handling

Implementation Details

CLI Interface

# Remove all historical trie data
katana db prune --latest -p path/to/db

# Keep only the last 1000 blocks of historical data
katana db prune --keep-last 1000 -p path/to/db

Database Tables Affected

The pruning tool operates on six trie-related tables:

History Tables: ClassesTrieHistory, ContractsTrieHistory, StoragesTrieHistory
Changeset Tables: ClassesTrieChangeSet, ContractsTrieChangeSet, StoragesTrieChangeSet

Safety Features

Input validation (blocks > 0)
Warning when requested blocks exceed available blocks (no-op behavior)
Preserves current trie state tables (ClassesTrie, ContractsTrie, StoragesTrie)
Transaction-based operations with proper commit/rollback handling

Progress Visualization

Uses indicatif (v0.17.8) for professional progress bars
Shows table name, entries processed, elapsed time, and processing speed
Multi-level progress bars for overall and per-table progress

Testing

✅ Successfully builds with cargo build --bin katana
✅ Passes all linting checks with scripts/clippy.sh
✅ Code formatted with cargo +nightly fmt --all
✅ Comprehensive test coverage using production-like data insertion
✅ Tests verify state roots remain unchanged after pruning
✅ Edge case testing (e.g., requested blocks > available blocks)

Test Improvements

Tests now use BlockWriter::insert_block_with_states_and_receipts for production-like setup
Verification uses StateFactoryProvider to check historical state accessibility
State root verification ensures data integrity is maintained
Added test case for when requested blocks exceed available blocks

Technical Implementation

Uses DbTxMut for write transactions with proper commit/rollback handling
Implements efficient cursor-based iteration for bulk operations
Handles IntegerSet (RoaringTreemap) data structure for changeset tables
Batch processing (1000 entries) to reduce memory usage during changeset pruning
Progress monitoring with time estimates and processing speed visualization
Follows existing code patterns and error handling conventions

Files Changed

bin/katana/src/cli/db/prune.rs: Added pruning functionality with progress monitoring
bin/katana/tests/db_prune.rs: Comprehensive test suite with state root verification
bin/katana/Cargo.toml: Added indicatif dependency for progress bars
Other supporting files for the database utility structure

Link to Devin run

https://app.devin.ai/sessions/746969faebe44c02aa1ee1ff518951fb

Requested by

karii@cartridge.gg

devin-ai-integration · 2025-06-13T07:34:29Z

Original prompt from karii@cartridge.gg:

Currently, we have a problem where the Katana database is growing at an exponential rate. One primary factor is that it is storing all historical state trie data since the genesis block. 

This is the table definitions for everything related to the state trie:

```crates/storage/db/src/tables.rs
    /// Class trie
    ClassesTrie: (TrieDatabaseKey) => TrieDatabaseValue,
    /// Contract trie
    ContractsTrie: (TrieDatabaseKey) => TrieDatabaseValue,
    /// Contract storage trie
    StoragesTrie: (TrieDatabaseKey) => TrieDatabaseValue,

    /// Class trie history
    ClassesTrieHistory: (BlockNumber, TrieDatabaseKey) => TrieHistoryEntry,
    /// Contract trie history
    ContractsTrieHistory: (BlockNumber, TrieDatabaseKey) => TrieHistoryEntry,
    /// Contract storage trie history
    StoragesTrieHistory: (BlockNumber, TrieDatabaseKey) => TrieHistoryEntry,

    /// Class trie change set
    ClassesTrieChangeSet: (TrieDatabaseKey) => BlockList,
    /// contract trie change set
    ContractsTrieChangeSet: (TrieDatabaseKey) => BlockList,
    /// contract storage trie change set
    StoragesTrieChangeSet: (TrieDatabaseKey) => BlockList

and this is how the historical trie is being stored. The snapshot method is being called every time the trie is committed. And the trie is committed on every new block.

    fn snapshot(&mut self, id: CommitId) {
        let block_number: BlockNumber = id.into();

        let entries = std::mem::take(&mut self.write_cache);
        let entries = entries.into_iter().map(|(key, value)| TrieHistoryEntry { key, value });

        for entry in entries {
            let mut set = self
                .tx
                .get::<Tb::Changeset>(entry.key.clone())
                .expect("failed to get trie change set")
                .unwrap_or_default();
            set.insert(block_number);

            self.tx
                .put::<Tb::Changeset>(entry.key.clone(), set)
                .expect("failed to put trie change set");

            self.tx
                .put::<Tb::History>(block_number, entry)
                .expect("failed to put trie history entry");
        }
    }

The trie is committed by calling BonsaiTrie::commit. Note: BonsaiTrie is a base class that is used to derive more specific trie (ie, ClassesTrie, or ContractsTrie).

impl<DB, Hash> BonsaiTrie<DB, Hash>
where
    DB: BonsaiDatabase + BonsaiPersistentDatabase<CommitId>,
    Hash: StarkHash + Send + Sync,
{
    pub fn insert(&mut self, id: &[u8], key: Felt, value: Felt) {
        let key: BitVec = key.to_bytes_be().as_bits()[5..].to_owned();
        self.storage.insert(id, &key, &value).unwrap();
    }

    pub fn commit(&mut self, id: CommitId) {
        self.storage.commit(id).expect("failed to commit trie");
    }
}

Your job is to create a database pruning tool. The tool should be part of the katana db utility command. It should allow to prune; (1) all historical data (meaning only keep the latest trie), or (2) keep only the last N blocks.

You only need to look in the following repo: dojoengine/katana

devin-ai-integration · 2025-06-13T07:34:30Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

bin/katana/src/cli/db.rs

github-actions · 2025-06-13T07:56:59Z

Binary size report 📊

Branch	Size
`main`	65.91MiB
`devin/1749799396-database-pruning-tool`	66.19MiB ( +0.43% )

codecov · 2025-06-13T08:35:24Z

Codecov Report

Attention: Patch coverage is 50.00000% with 128 lines in your changes missing coverage. Please review.

Project coverage is 73.52%. Comparing base (bf21e05) to head (2172214).
Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
bin/katana/src/cli/db/stats.rs	0.00%	81 Missing ⚠️
bin/katana/src/cli/db/version.rs	0.00%	14 Missing ⚠️
bin/katana/src/cli/db/prune.rs	91.59%	10 Missing ⚠️
crates/storage/db/src/models/trie.rs	0.00%	10 Missing ⚠️
bin/katana/src/cli/db/mod.rs	61.90%	8 Missing ⚠️
crates/storage/db/src/models/list.rs	50.00%	3 Missing ⚠️
bin/katana/src/main.rs	0.00%	1 Missing ⚠️
crates/primitives/src/state.rs	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #123      +/-   ##
==========================================
+ Coverage   73.27%   73.52%   +0.24%     
==========================================
  Files         201      205       +4     
  Lines       22115    22267     +152     
==========================================
+ Hits        16205    16372     +167     
+ Misses       5910     5895      -15

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

bin/katana/src/cli/db.rs

bin/katana/tests/db_prune.rs

- Add prune subcommand to katana db utility - Support two modes: prune all history or keep last N blocks - Include safety checks and proper error handling - Efficiently handle bulk deletions using database cursors Co-Authored-By: karii@cartridge.gg <evergreenkary@gmail.com>

…d import - Fix double reference issue in get_latest_block_number call - Change function signature to use DbTx instead of DbTxRef - Remove unused DbTxRef import to clean up warnings Co-Authored-By: karii@cartridge.gg <evergreenkary@gmail.com>

- Add custom Arbitrary implementation for TrieDatabaseKey with 256-byte limit - Add arbitrary feature to katana-db Cargo.toml with optional dependency - Create comprehensive integration tests in pruning_tests.rs - Test pruning modes: all history removal and keep-last N blocks - Verify current trie state preservation during pruning operations - Test edge cases including empty database and key generation limits - All tests pass with proper type conversions and trait imports Co-Authored-By: karii@cartridge.gg <evergreenkary@gmail.com>

- Replace manual generate_test_key function with TrieDatabaseKey::arbitrary() - Update populate_test_data to use Arbitrary-generated keys for better test coverage - Remove unused TrieDatabaseKeyType import - All tests continue to pass with randomly generated keys respecting 256-byte constraint - Demonstrates Arbitrary trait implementation works correctly in practice Co-Authored-By: karii@cartridge.gg <evergreenkary@gmail.com>

claude · 2025-06-16T14:16:54Z

Claude finished @devin-ai-integration[bot]'s task —— View job

Todo List

Read main implementation file (bin/katana/src/cli/db.rs)
Review test file (bin/katana/tests/db_prune.rs)
Check other changed files for context
Analyze code quality and best practices
Identify potential bugs or issues
Evaluate performance considerations
Assess security concerns
Review test coverage
Provide comprehensive review feedback

PR Review: Database Pruning Tool

Thank you for implementing this database pruning tool. The feature addresses an important need for managing database growth. Here's my comprehensive review:

✅ Code Quality and Best Practices

Strengths:

Clean separation of concerns with distinct pruning modes
Good use of Rust idioms (Result types, pattern matching)
Proper error handling with contextual messages
Transaction-based operations ensure atomicity
Follows the codebase conventions for imports and formatting

Suggestions:

Consider adding more detailed documentation for the pruning algorithms, especially the cursor-based iteration logic
The prune_changeset_table function could benefit from explaining why we collect keys before updating (avoiding cursor invalidation)

🐛 Potential Bugs or Issues

Double Reference Issue (line 206 in db.rs):

let latest_block = get_latest_block_number(&&tx)?;

Should be:

let latest_block = get_latest_block_number(&tx)?;

Typo in Comment (line 72):
- "since tha latest block" → "since the latest block"
Test File Issues (db_prune.rs):
- Line 169-170, 184-185: The test is calling methods directly on db but should use db.open_ro() first
- Line 173: Should be assert_eq!(initial_headers_count, 16); (0-15 = 16 blocks)
- Line 203: Same issue, should be 16 not 11
Edge Case in KeepLastN:
- When blocks == latest_block, the function returns early with just a warning. Consider if this should prune all history instead

⚡ Performance Considerations

Positive:

Cursor-based iteration is efficient for bulk operations
Collecting keys before updates in prune_changeset_table avoids cursor invalidation
Transaction batching minimizes write overhead

Concerns:

The prune_changeset_table function loads all keys into memory before processing. For very large databases, this could cause memory issues
Consider implementing progress reporting for long-running operations
No concurrent operations support - the entire database is locked during pruning

Suggestion: Consider processing changesets in batches to reduce memory usage:

const BATCH_SIZE: usize = 1000;
let mut batch = Vec::with_capacity(BATCH_SIZE);
// Process in batches...

🔒 Security Concerns

Data Loss Risk: While there are warnings, consider adding a --force flag for destructive operations
Path Expansion: Good use of shellexpand for path handling, preventing directory traversal issues
No Backup Mechanism: Consider adding a recommendation to backup before pruning
Permissions: The tool correctly uses appropriate read/write permissions

🧪 Test Coverage

Coverage Gaps:

Missing test for error cases (invalid block numbers, database errors)
No test for the edge case where requested blocks > available blocks
Missing test for empty database scenario
No integration test for the actual CLI invocation with error cases
Missing tests for changeset pruning logic specifically

Test Improvements Needed:

Add tests for concurrent access scenarios
Test pruning with very large datasets
Add tests for transaction rollback on errors
Verify that current trie state tables remain untouched

Additional Recommendations

Monitoring: Add metrics/logging for:
- Number of entries pruned
- Time taken for operation
- Database size before/after
Validation: Consider adding a --dry-run flag to preview what would be pruned
Recovery: Document recovery procedures if pruning is interrupted
CLI UX: The command structure is good, but consider adding:
- Confirmation prompt for destructive operations
- Progress bar for long operations
- Summary statistics after completion

Summary

This is a solid implementation that addresses the core requirements. The main issues are minor (typos, test fixes) with some opportunities for enhancement around memory efficiency and test coverage. The transaction-based approach ensures data integrity, which is crucial for a pruning tool.

Priority fixes:

Fix the double reference bug
Correct the test assertions
Add basic error case tests
Consider memory usage for large databases

Overall, good work on implementing this important feature! With these improvements, it will be a robust solution for managing database growth.

kariy · 2025-06-16T15:56:31Z

@claude Let's update the tests (in bin/katana/tests/db_prune.rs) to be more reliable by actually computing the trie. Instead of inserting the individual database entries manually, let's insert them thru the trie provider trait.

impl<Db: Database> TrieWriter for DbProvider<Db> {
    fn trie_insert_declared_classes(
        &self,
        block_number: BlockNumber,
        updates: &BTreeMap<ClassHash, CompiledClassHash>,
    ) -> ProviderResult<Felt> { .. }

    fn trie_insert_contract_updates(
        &self,
        block_number: BlockNumber,
        state_updates: &StateUpdates,
    ) -> ProviderResult<Felt> { .. }
}

To ensure the pruning process is done correctly, we need to make sure the latest state root is still the same before and after the pruning.

We can use the StateRootProvider trait to fetch the latest state root.

impl<Tx> StateRootProvider for LatestStateProvider<Tx>
where
    Tx: DbTx + Send + Sync,
{
    fn classes_root(&self) -> ProviderResult<Felt> { .. }

    fn contracts_root(&self) -> ProviderResult<Felt> { ..  }

    fn storage_root(&self, contract: ContractAddress) -> ProviderResult<Option<Felt>> { .. }
}

claude · 2025-06-16T15:56:44Z

Claude finished @kariy's task —— View job

Todo List

Check for CLAUDE.md files for repo-specific instructions
Read current test implementation in bin/katana/tests/db_prune.rs
Find TrieWriter trait definition and usage patterns
Find StateRootProvider trait definition and usage patterns
Refactor tests to use TrieWriter for data insertion
Add state root verification before and after pruning
Run tests to ensure they pass (not able to run without bash permissions)
Commit the changes

Summary

I've successfully refactored the database pruning tests to be more reliable by using the proper trie provider traits as requested. Here are the key changes:

Changes Made:

Updated imports: Added necessary imports for StateUpdates, StateRootProvider, TrieWriter, and related types.
Replaced manual database insertion:
- Removed the old generate_trie_entries function that manually created TrieHistoryEntry objects
- Created helper functions to generate test addresses, class hashes, and compiled class hashes
Rewrote populate_db function:
- Now uses TrieWriter trait methods (trie_insert_declared_classes and trie_insert_contract_updates)
- Creates proper StateUpdates objects for each block with:
  - Declared classes (every 3rd block)
  - Deployed contracts (every 2nd block)
  - Storage updates (for all existing contracts)
  - Nonce updates
Added state root verification:
- Created get_state_roots function that uses StateRootProvider to fetch classes and contracts roots
- Both test functions now verify that state roots remain unchanged after pruning
Enhanced test assertions:
- Added descriptive messages to all assertions
- Verify that the pruning process doesn't affect the current state

The tests now properly compute the trie using the same mechanisms as the actual code, making them more reliable and ensuring that the pruning process correctly preserves the latest state while removing historical data.
| View changes

…e roots - Replace manual database entry insertion with TrieWriter trait methods - Use trie_insert_declared_classes and trie_insert_contract_updates - Add state root verification before and after pruning using StateRootProvider - Create more realistic test data with proper StateUpdates structure - Ensure pruning operations don't affect the current state roots Co-authored-by: Ammar Arif <kariy@users.noreply.github.com>

kariy · 2025-06-16T16:22:36Z

@claude

Ok, i just remember that using BlockWriter::insert_block_with_states_and_receipts is a much better approach than inserting the header and state updates separately. Using this method to populate the database would also better reflect production setting.

impl<Db: Database> BlockWriter for DbProvider<Db> {
    fn insert_block_with_states_and_receipts(
        &self,
        block: SealedBlockWithStatus,
        states: StateUpdatesWithClasses,
        receipts: Vec<Receipt>,
        executions: Vec<TypedTransactionExecutionInfo>,
    ) -> ProviderResult<()> { .. }

claude · 2025-06-16T16:22:48Z

Claude finished @kariy's task —— View job

I've successfully refactored the database pruning tests to use BlockWriter::insert_block_with_states_and_receipts for a more production-like setup.

Todo List

Check for CLAUDE.md files for repo-specific instructions
Read current test implementation in bin/katana/tests/db_prune.rs
Find BlockWriter trait and its method signature
Understand how to create SealedBlockWithStatus and other required parameters
Refactor the populate_db function to use the new method
Ensure state root verification is still working correctly
Commit the changes

Summary

The tests have been updated to better reflect production settings by:

Using BlockWriter::insert_block_with_states_and_receipts instead of manually inserting headers and calling trie methods separately
Creating proper block structures with SealedBlockWithStatus containing:
- FinalityStatus::AcceptedOnL2
- Sealed blocks with appropriate hash and header data
Wrapping state updates in StateUpdatesWithClasses objects
Maintaining state root verification to ensure the pruning process doesn't affect the latest state

The refactored approach more closely mirrors how blocks are actually inserted in production, making the tests more reliable and representative of real-world usage.
|

kariy · 2025-06-16T16:45:03Z

(aside) @claude

Counting the historical entries isn't the most robust way to check for the historical states validity before/after the pruning.

/// Count total historical entries in the database
fn count_historical_entries(db: &DbEnv) -> Result<usize> {
    let tx = db.tx()?;
    let mut total = 0;

    // Count ClassesTrieHistory entries
    let mut cursor = tx.cursor_dup::<tables::ClassesTrieHistory>()?;
    let mut walker = cursor.walk(None)?;
    while walker.next().transpose()?.is_some() {
        total += 1;
    }

    // Count ContractsTrieHistory entries
    let mut cursor = tx.cursor_dup::<tables::ContractsTrieHistory>()?;
    let mut walker = cursor.walk(None)?;
    while walker.next().transpose()?.is_some() {
        total += 1;
    }

    // Count StoragesTrieHistory entries
    let mut cursor = tx.cursor_dup::<tables::StoragesTrieHistory>()?;
    let mut walker = cursor.walk(None)?;
    while walker.next().transpose()?.is_some() {
        total += 1;
    }

    Ok(total)
}

/// Count headers in the database (should remain unchanged after pruning)
fn count_headers(db: &DbEnv) -> Result<usize> {
    let tx = db.tx()?;
    let mut cursor = tx.cursor::<tables::Headers>()?;
    let mut count = 0;

    let mut walker = cursor.walk(None)?;
    while walker.next().transpose()?.is_some() {
        count += 1;
    }

    Ok(count)
}

We should instead try fetching the historical state roots using:

impl<Tx> StateRootProvider for HistoricalStateProvider<Tx>
where
    Tx: DbTx + Send + Sync,
{
    fn classes_root(&self) -> ProviderResult<katana_primitives::Felt> { .. }

    fn contracts_root(&self) -> ProviderResult<katana_primitives::Felt> { .. }

    fn storage_root(&self, contract: ContractAddress) -> ProviderResult<Option<Felt>> { .. }

    fn state_root(&self) -> ProviderResult<katana_primitives::Felt> { .. }
}

HistoricalStateProvider can be obtained by using this trait:

impl<Db: Database> StateFactoryProvider for DbProvider<Db> {
    fn historical(
        &self,
        block_id: BlockHashOrNumber,
    ) -> ProviderResult<Option<Box<dyn StateProvider>>> { .. }
}

claude · 2025-06-16T16:45:21Z

Claude finished @kariy's task —— View job

I'll analyze this and get back to you.

kariy · 2025-06-17T11:55:45Z

@claude please add a test case for when requested blocks > available blocks

claude · 2025-06-17T11:55:57Z

Claude finished @kariy's task —— View job

Todo List

Read current test implementation in bin/katana/tests/db_prune.rs
Read the prune implementation to understand the edge case handling
Add test case for when requested blocks > available blocks
Commit the changes to the PR branch

Summary

I've successfully added a test case for the scenario where requested blocks exceed available blocks in the database. The new test prune_keep_last_n_blocks_exceeds_available verifies that:

When requesting to keep more blocks than exist (e.g., asking for 25 blocks when only 15 exist)
The prune command prints a warning and returns early without performing any pruning
All historical state data remains intact and accessible
State roots remain unchanged

This ensures the edge case is properly handled and tested, preventing accidental data loss when users specify incorrect block counts.
| View PR branch

This test verifies that when the user requests to keep more blocks than are available in the database, the prune command correctly: - Prints a warning and returns early - Performs no pruning operations - Preserves all historical state data - Keeps state roots unchanged Co-authored-by: Ammar Arif <kariy@users.noreply.github.com>

kariy · 2025-06-17T12:03:55Z

let's update the prune_changeset_table function to process the changesets in batches to reduce memory usage.

claude · 2025-06-17T12:04:01Z

Claude encountered an error —— View job

I'll analyze this and get back to you.

claude · 2025-06-17T12:07:18Z

Claude encountered an error —— View job

I'll analyze this and get back to you.

claude · 2025-06-17T12:41:04Z

Claude finished @devin-ai-integration[bot]'s task —— View job

Todo List

PR Review: Database Pruning Tool

Thank you for implementing this database pruning tool. The feature addresses an important need for managing database growth. Here's my comprehensive review:

✅ Code Quality and Best Practices

Strengths:

Clean separation of concerns with distinct pruning modes
Good use of Rust idioms (Result types, pattern matching)
Proper error handling with contextual messages
Transaction-based operations ensure atomicity
Follows the codebase conventions for imports and formatting

Suggestions:

Consider adding more detailed documentation for the pruning algorithms, especially the cursor-based iteration logic
The prune_changeset_table function could benefit from explaining why we collect keys before updating (avoiding cursor invalidation)

🐛 Potential Bugs or Issues

Edge Case in KeepLastN:
- When blocks > latest_block, the function returns early with just a warning. This is correct behavior but could be clearer in documentation
Zero Block Check (line 70):
- The check for blocks == 0 is unnecessary since the clap parser already validates value_parser!(u64).range(1..) ensuring blocks >= 1

⚡ Performance Considerations

Positive:

Cursor-based iteration is efficient for bulk operations
Collecting keys before updates in prune_changeset_table avoids cursor invalidation
Transaction batching minimizes write overhead
Already implements batch processing (BATCH_SIZE = 1000) in prune_changeset_table

Concerns:

No progress reporting for long-running operations
No concurrent operations support - the entire database is locked during pruning
Consider allowing configurable batch size for different database sizes

🔒 Security Concerns

Data Loss Risk: While there are warnings, consider adding a --force flag for destructive operations
Path Expansion: Good use of shellexpand for path handling, preventing directory traversal issues
No Backup Mechanism: Consider adding a recommendation to backup before pruning
Permissions: The tool correctly uses appropriate read/write permissions

🧪 Test Coverage

Good Coverage:

Tests verify state roots remain unchanged after pruning
Both pruning modes are tested
Uses realistic data generation with the TrieWriter trait
Good test for edge case where requested blocks > available blocks

Coverage Gaps:

Missing test for error cases (database errors, transaction failures)
Missing test for empty database scenario
No integration test for the CLI invocation with error cases
Missing tests for concurrent access scenarios

Test Quality:

The recent refactoring to use BlockWriter::insert_block_with_states_and_receipts is excellent - better reflects production usage
Good use of historical state root verification instead of just counting entries

Additional Recommendations

Monitoring: Add metrics/logging for:
- Number of entries pruned
- Time taken for operation
- Database size before/after
Validation: Consider adding a --dry-run flag to preview what would be pruned
Recovery: Document recovery procedures if pruning is interrupted
CLI UX: The command structure is good, but consider adding:
- Confirmation prompt for destructive operations
- Progress bar for long operations
- Summary statistics after completion

Summary

This is a solid implementation that addresses the core requirements. The transaction-based approach ensures data integrity, which is crucial for a pruning tool. The main areas for improvement are:

Enhanced test coverage for error cases
Better user feedback during operations (progress reporting)
Documentation improvements
Consider adding safety features like --force flag

The code is well-structured and follows good practices. With the suggested improvements, this will be a robust solution for managing database growth.

kariy · 2025-06-17T12:47:04Z

@claude let's add monitoring to track the progression of the pruning process. then we should show a progress bar to visualize the progression in a user friendly way

devin-ai-integration · 2025-06-17T12:47:09Z

Devin is currently unreachable - the session may have died.

claude · 2025-06-17T12:47:18Z

Claude finished @kariy's task —— View job

✅ Progress monitoring and visualization added to database pruning

Completed Tasks

Read current pruning implementation
Add indicatif dependency for progress bars
Modify prune_history_table to track progress
Modify prune_changeset_table to track progress
Add progress bars to main pruning functions
Test the implementation
Commit the changes

Summary

Successfully implemented progress monitoring and visualization for the database pruning process:

Added indicatif dependency (v0.17.8) for professional progress bar visualization
Multi-level progress bars showing both overall pruning progress and individual table progress
Pre-counting passes to determine total entries for accurate progress tracking
Detailed progress information including:
- Table name being processed
- Current/total entries processed
- Elapsed time and processing speed (entries per second)
- Visual progress bars with cyan/blue styling
Non-intrusive progress bars that clear after each table is completed

The implementation maintains the existing batch processing approach for memory efficiency while providing real-time feedback to users during potentially long-running pruning operations.
| View PR branch

- Add indicatif dependency for progress bar visualization - Implement multi-level progress bars showing overall and per-table progress - Add counting passes to determine total entries for accurate progress tracking - Show table names, current/total entries, elapsed time, and entries per second - Visual progress bars with cyan/blue styling for better user experience Co-authored-by: Ammar Arif <kariy@users.noreply.github.com>

graphite-app · 2025-06-17T12:53:08Z

bin/katana/src/cli/db/prune.rs

+    if let Some((block, _)) = cursor.first()? {
+        let mut current_block = block;
+        while current_block <= cutoff_block {
+            blocks_to_prune += 1;
+            if let Some((next_block, _)) = cursor.next()? {
+                current_block = next_block;
+            } else {
+                break;
+            }
+        }


The block counting logic may lead to inaccurate results. The cursor iteration with cursor.next() advances to the next key-value pair in the database, not necessarily the next block number. If multiple trie entries exist for a single block, this approach will count each entry rather than each unique block number. Consider tracking unique block numbers or using a different approach to determine the exact count of blocks to prune.

Suggested change

if let Some((block, _)) = cursor.first()? {

let mut current_block = block;

while current_block <= cutoff_block {

blocks_to_prune += 1;

if let Some((next_block, _)) = cursor.next()? {

current_block = next_block;

} else {

break;

}

}

if let Some((block, _)) = cursor.first()? {

let mut current_block = block;

let mut unique_blocks = std::collections::HashSet::new();

while current_block <= cutoff_block {

unique_blocks.insert(current_block);

if let Some((next_block, _)) = cursor.next()? {

current_block = next_block;

} else {

break;

}

}

blocks_to_prune = unique_blocks.len();

Spotted by Diamond

Is this helpful? React 👍 or 👎 to let us know.

claude · 2025-06-17T14:03:14Z

claude · 2025-06-17T14:21:24Z

Claude finished @devin-ai-integration[bot]'s task —— View job

Todo List

PR Review: Database Pruning Tool

Thank you for implementing this database pruning tool with progress monitoring. Here's my comprehensive review:

✅ Code Quality and Best Practices

Strengths:

Clean separation of concerns with distinct pruning modes
Good use of Rust idioms (Result types, pattern matching)
Proper error handling with contextual messages
Transaction-based operations ensure atomicity
Progress bars provide excellent user feedback
Batch processing (1000 entries) prevents memory issues for large databases

Areas for Improvement:

Progress Bar Consistency: prune_all_history uses MultiProgress unnecessarily (only one progress bar shown). Consider simplifying to match prune_keep_last_n
Magic Numbers: Document why BATCH_SIZE = 1000 was chosen
Logging: Replace println! and eprintln! with proper logging infrastructure (tracing crate)
Code Duplication: Extract progress bar style constants to avoid repetition

🐛 Potential Bugs or Issues

Edge Case Handled Well: The blocks == 0 check (line 72) is actually redundant since clap validates value_parser!(u64).range(1..), but it's good defensive programming
Minor Inefficiency: In prune_changeset_table line 263, consider adding an is_empty() method to IntegerSet instead of block_list.select(0).is_none()
Progress Counting: In prune_history_table, the progress isn't tracked per entry, which might make the progress bar less accurate for large datasets

⚡ Performance Considerations

Positive:

Cursor-based iteration is efficient for bulk operations
Batch processing (1000 entries at a time) prevents memory exhaustion
Collecting keys before updates avoids cursor invalidation
Transaction batching minimizes write overhead

Suggestions:

Consider making BATCH_SIZE configurable based on available memory
Add estimated time remaining to progress bars for better UX
Consider parallel processing for independent table operations (though this might complicate transaction handling)

🔒 Security Concerns

No significant security issues identified:

Proper use of shellexpand prevents path traversal
Transaction rollback on errors prevents partial state
No arbitrary code execution risks
Appropriate file permissions used

Recommendation: Add a --dry-run flag to preview operations before execution

🧪 Test Coverage

Good Coverage:

Tests verify state roots remain unchanged after pruning
Both pruning modes tested
Edge case test for when requested blocks > available blocks
Uses production-like setup with BlockWriter

Missing Coverage:

Error handling scenarios (database failures, transaction rollbacks)
Empty database scenario
Concurrent access behavior
Very large dataset performance testing
Interrupted pruning recovery

📝 Additional Recommendations

Extract Constants:

const PROGRESS_STYLE: &str = "{msg} {bar:40.cyan/blue} {pos:>7}/{len:7} [{elapsed_precise}] {per_sec}";
const BATCH_SIZE: usize = 1000; // Process in chunks to limit memory usage

Add Metrics: Report total entries pruned, time taken, and space reclaimed
CLI Enhancement: Consider adding:
- --force flag for destructive operations
- Confirmation prompt: "This will permanently delete data. Continue? [y/N]"
- Summary after completion showing space saved
Documentation: Add examples to CLI help text showing common scenarios

Summary

This is a well-implemented feature that addresses database growth concerns effectively. The addition of progress monitoring significantly improves user experience, and the batch processing ensures scalability. The code is clean, follows Rust best practices, and handles edge cases appropriately.

The main improvements would be around consistency (progress bars, logging) and additional safety features (dry-run, confirmation prompts). With these minor enhancements, this will be a robust solution for managing historical trie data.

Verdict: Approve with minor suggestions ✅

kariy · 2025-06-17T14:24:01Z

@claude let's update the PR description to reflect the latest changes to the PR

claude · 2025-06-17T14:24:23Z

Claude finished @kariy's task —— View job

Todo List

Read main implementation file to understand current features
Review test file changes
Check Cargo.toml for new dependencies
Create updated PR description
Update this comment with the new PR description

bin/katana/src/cli/db/prune.rs

claude · 2025-06-17T14:33:22Z

Claude finished @devin-ai-integration[bot]'s task —— View job

Todo List

PR Review: Database Pruning Tool

Thank you for implementing this database pruning tool. The feature addresses an important need for managing database growth. Here's my comprehensive review:

✅ Code Quality and Best Practices

Strengths:

Clean separation of concerns with distinct pruning modes
Good use of Rust idioms (Result types, pattern matching)
Proper error handling with contextual messages
Transaction-based operations ensure atomicity
Follows the codebase conventions for imports and formatting
Excellent progress monitoring with indicatif library
Batch processing (1000 entries at a time) prevents memory exhaustion

Areas for Improvement:

Progress Bar Consistency: prune_all_history uses MultiProgress unnecessarily (only one progress bar shown at a time). Consider simplifying to match prune_keep_last_n
Magic Numbers: Document why BATCH_SIZE = 1000 was chosen
Code Comments: The typo in line 24 "since tha latest block" → "since the latest block"

🐛 Potential Bugs or Issues

Zero Block Check (line 72): While the clap validator ensures blocks >= 1, keeping this check is good defensive programming
Minor Inefficiency (line 261): Consider adding an is_empty() method to IntegerSet for clearer intent:
```
if block_list.is_empty() { // Instead of block_list.select(0).is_none()
```
Batch Processing Edge Case: The final batch processing after the loop (lines 281-287) correctly handles remaining entries ✅

⚡ Performance Considerations

Positive:

Cursor-based iteration is efficient for bulk operations
Batch processing (1000 entries) prevents memory issues
Transaction batching minimizes write overhead
Progress bars provide excellent user feedback during long operations

Suggestions:

Consider making BATCH_SIZE configurable based on available memory
The progress counting logic is efficient and provides good UX
Pre-counting entries for progress bars adds overhead but improves user experience - good tradeoff

🔒 Security Concerns

Well Handled:

Proper use of shellexpand for path handling prevents directory traversal
Transaction rollback on errors prevents partial state
No arbitrary code execution risks
Appropriate file permissions used

Recommendations:

Consider adding a --dry-run flag to preview operations
Add a confirmation prompt for destructive operations: "This will permanently delete data. Continue? [y/N]"
Consider logging pruned data statistics for audit purposes

🧪 Test Coverage

Excellent Coverage:

Tests verify state roots remain unchanged after pruning ✅
Both pruning modes tested thoroughly
Edge case test for when requested blocks > available blocks ✅
Uses production-like setup with BlockWriter::insert_block_with_states_and_receipts ✅
Historical state verification using StateFactoryProvider ✅

Missing Coverage:

Error handling scenarios (database failures, transaction rollbacks)
Empty database scenario
Concurrent access behavior
Very large dataset performance testing

📝 Additional Recommendations

User Experience Enhancements:
- Add estimated time remaining to progress bars
- Report total entries pruned and disk space reclaimed after completion
- Consider adding a --force flag for CI/automation scenarios
Documentation:
- Add examples to CLI help text showing common scenarios
- Document the pruning algorithm in more detail
- Add recovery procedures if pruning is interrupted

Code Organization:

Extract progress bar styling constants to reduce duplication:

const PROGRESS_STYLE: &str = "{msg} {bar:40.cyan/blue} {pos:>7}/{len:7} [{elapsed_precise}] {per_sec}";

Summary

This is a well-implemented feature that effectively addresses database growth concerns. The code is clean, follows Rust best practices, and handles edge cases appropriately. The addition of progress monitoring significantly improves user experience, and the batch processing ensures scalability.

Main areas for improvement are around additional safety features (dry-run, confirmation prompts) and enhanced test coverage for error scenarios. With these minor enhancements, this will be a robust solution for managing historical trie data.

Verdict: Approve with minor suggestions ✅

kariy · 2025-06-17T14:39:45Z

@claude can you open issues for adding these safety features; dry-run, confirmation prompts.

claude · 2025-06-17T14:39:57Z

Claude finished @kariy's task —— View job

Todo List

Understand the request
Check my capabilities for creating issues
Draft issue descriptions for the safety features

I'm unable to create GitHub issues directly due to my current permissions. However, I can help by drafting comprehensive issue descriptions that you can use to create these issues yourself.

Issue 1: Add --dry-run flag to database pruning tool

Title: feat(katana-db): Add --dry-run flag to prune command for safer operations

Description:
Currently, the katana db prune command immediately executes database pruning operations which permanently delete historical trie data. This poses a risk for users who might accidentally run the wrong command or want to preview the impact before execution.

Feature Request:
Add a --dry-run flag that:

Simulates the pruning operation without actually deleting data
Reports what would be deleted (number of entries, affected tables, estimated space to be reclaimed)
Shows which blocks would be pruned
Validates the operation parameters

Example Usage:

# Preview what would be deleted
katana db prune --latest --dry-run -p path/to/db

# Preview keeping last 1000 blocks
katana db prune --keep-last 1000 --dry-run -p path/to/db

Implementation Notes:

Modify bin/katana/src/cli/db/prune.rs to add the dry-run flag
Count entries that would be deleted without performing actual deletions
Display summary statistics after the dry run

Issue 2: Add confirmation prompt to database pruning tool

Title: feat(katana-db): Add interactive confirmation prompt for pruning operations

Description:
The database pruning tool currently executes immediately when run, which could lead to accidental data loss. Adding an interactive confirmation prompt would provide an additional safety layer.

Feature Request:
Add an interactive confirmation prompt that:

Displays a warning about permanent data deletion
Shows summary of what will be pruned
Requires explicit user confirmation (y/N)
Can be bypassed with a --force flag for CI/automation scenarios

Example Interaction:

$ katana db prune --latest -p path/to/db

WARNING: This operation will permanently delete historical trie data.
- Tables affected: 6 (ClassesTrieHistory, ContractsTrieHistory, etc.)
- Estimated entries to delete: 125,432
- Current database size: 2.3 GB

This action cannot be undone. Continue? [y/N]:

Implementation Notes:

Add --force flag to skip confirmation (useful for scripts/CI)
Use a crate like dialoguer or inquire for interactive prompts
Ensure the prompt clearly communicates the risks
Default to "No" if user just presses Enter

Priority: High (safety feature)

These draft issues should help you create the GitHub issues for tracking these important safety enhancements to the database pruning tool.

* feat(db): add database pruning tool for historical trie data - Add prune subcommand to katana db utility - Support two modes: prune all history or keep last N blocks - Include safety checks and proper error handling - Efficiently handle bulk deletions using database cursors * feat: add progress monitoring and visualization for database pruning - Add indicatif dependency for progress bar visualization - Implement multi-level progress bars showing overall and per-table progress - Add counting passes to determine total entries for accurate progress tracking - Show table names, current/total entries, elapsed time, and entries per second - Visual progress bars with cyan/blue styling for better user experience --------- Co-authored-by: Ammar Arif <evergreenkary@gmail.com> Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>

graphite-app bot reviewed Jun 13, 2025

View reviewed changes

bin/katana/src/cli/db.rs Outdated Show resolved Hide resolved

graphite-app bot reviewed Jun 13, 2025

View reviewed changes

bin/katana/src/cli/db.rs Outdated Show resolved Hide resolved

graphite-app bot reviewed Jun 13, 2025

View reviewed changes

bin/katana/tests/db_prune.rs Outdated Show resolved Hide resolved

bin/katana/tests/db_prune.rs Outdated Show resolved Hide resolved

devin-ai-integration bot and others added 8 commits June 16, 2025 21:58

wip

b8c2b96

wip

960837a

wip

4595a71

wip

d3d27e6

kariy force-pushed the devin/1749799396-database-pruning-tool branch from 98f0ed7 to d3d27e6 Compare June 16, 2025 14:16

add starknet-types-core patch

25c4234

kariy marked this pull request as draft June 16, 2025 14:26

kariy added 3 commits June 16, 2025 23:37

make tests passing for now

eacc8fa

get provider

9e7f109

wup

8ab0e29

claude bot and others added 2 commits June 16, 2025 16:03

fmt

fdf8f44

claude bot and others added 2 commits June 17, 2025 11:57

fmt

2172214

prune changeset table in batches to avoid high memory usage

e7b9245

graphite-app bot reviewed Jun 17, 2025

View reviewed changes

fix

4a6ce3d

remove per table progress bar

cddc9e7

kariy changed the title ~~feat(db): add database pruning tool for historical trie data~~ feat(cli): database pruning tool for historical trie data Jun 17, 2025

graphite-app bot reviewed Jun 17, 2025

View reviewed changes

bin/katana/src/cli/db/prune.rs Show resolved Hide resolved

remove multibar

6aba6fe

kariy approved these changes Jun 17, 2025

View reviewed changes

kariy merged commit a404ab3 into main Jun 17, 2025
11 checks passed

kariy deleted the devin/1749799396-database-pruning-tool branch June 17, 2025 15:58

Conversation

devin-ai-integration bot commented Jun 13, 2025 • edited by kariy Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Database Pruning Tool for Historical Trie Data

Overview

Features

Implementation Details

CLI Interface

Database Tables Affected

Safety Features

Progress Visualization

Testing

Test Improvements

Technical Implementation

Files Changed

Link to Devin run

Requested by

Uh oh!

devin-ai-integration bot commented Jun 13, 2025

Uh oh!

devin-ai-integration bot commented Jun 13, 2025

🤖 Devin AI Engineer

Uh oh!

Uh oh!

github-actions bot commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Binary size report 📊

Uh oh!

codecov bot commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude bot commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Todo List

PR Review: Database Pruning Tool

✅ Code Quality and Best Practices

🐛 Potential Bugs or Issues

⚡ Performance Considerations

🔒 Security Concerns

🧪 Test Coverage

Additional Recommendations

Summary

Uh oh!

kariy commented Jun 16, 2025

Uh oh!

claude bot commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Todo List

Summary

Changes Made:

Uh oh!

kariy commented Jun 16, 2025

Uh oh!

claude bot commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Todo List

Summary

Uh oh!

kariy commented Jun 16, 2025

Uh oh!

claude bot commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kariy commented Jun 17, 2025

Uh oh!

claude bot commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Todo List

Summary

Uh oh!

kariy commented Jun 17, 2025

Uh oh!

claude bot commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

devin-ai-integration bot commented Jun 13, 2025 •

edited by kariy

Loading

github-actions bot commented Jun 13, 2025 •

edited

Loading

codecov bot commented Jun 13, 2025 •

edited

Loading

claude bot commented Jun 16, 2025 •

edited

Loading

claude bot commented Jun 16, 2025 •

edited

Loading

claude bot commented Jun 16, 2025 •

edited

Loading

claude bot commented Jun 16, 2025 •

edited

Loading

claude bot commented Jun 17, 2025 •

edited

Loading

claude bot commented Jun 17, 2025 •

edited

Loading

claude bot commented Jun 17, 2025 •

edited

Loading

claude bot commented Jun 17, 2025 •

edited

Loading

claude bot commented Jun 17, 2025 •

edited

Loading

claude bot commented Jun 17, 2025 •

edited

Loading

claude bot commented Jun 17, 2025 •

edited

Loading

claude bot commented Jun 17, 2025 •

edited by kariy

Loading

claude bot commented Jun 17, 2025 •

edited

Loading