feat(cli): database pruning tool for historical trie data#123
Conversation
|
Original prompt from karii@cartridge.gg: and this is how the historical trie is being stored. The The trie is committed by calling Your job is to create a database pruning tool. The tool should be part of the You only need to look in the following repo: dojoengine/katana |
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
Binary size report 📊
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #123 +/- ##
==========================================
+ Coverage 73.27% 73.52% +0.24%
==========================================
Files 201 205 +4
Lines 22115 22267 +152
==========================================
+ Hits 16205 16372 +167
+ Misses 5910 5895 -15 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
- Add prune subcommand to katana db utility - Support two modes: prune all history or keep last N blocks - Include safety checks and proper error handling - Efficiently handle bulk deletions using database cursors Co-Authored-By: karii@cartridge.gg <evergreenkary@gmail.com>
…d import - Fix double reference issue in get_latest_block_number call - Change function signature to use DbTx instead of DbTxRef - Remove unused DbTxRef import to clean up warnings Co-Authored-By: karii@cartridge.gg <evergreenkary@gmail.com>
- Add custom Arbitrary implementation for TrieDatabaseKey with 256-byte limit - Add arbitrary feature to katana-db Cargo.toml with optional dependency - Create comprehensive integration tests in pruning_tests.rs - Test pruning modes: all history removal and keep-last N blocks - Verify current trie state preservation during pruning operations - Test edge cases including empty database and key generation limits - All tests pass with proper type conversions and trait imports Co-Authored-By: karii@cartridge.gg <evergreenkary@gmail.com>
- Replace manual generate_test_key function with TrieDatabaseKey::arbitrary() - Update populate_test_data to use Arbitrary-generated keys for better test coverage - Remove unused TrieDatabaseKeyType import - All tests continue to pass with randomly generated keys respecting 256-byte constraint - Demonstrates Arbitrary trait implementation works correctly in practice Co-Authored-By: karii@cartridge.gg <evergreenkary@gmail.com>
98f0ed7 to
d3d27e6
Compare
|
Claude finished @devin-ai-integration[bot]'s task —— View job Todo List
PR Review: Database Pruning ToolThank you for implementing this database pruning tool. The feature addresses an important need for managing database growth. Here's my comprehensive review: ✅ Code Quality and Best PracticesStrengths:
Suggestions:
🐛 Potential Bugs or Issues
⚡ Performance ConsiderationsPositive:
Concerns:
Suggestion: Consider processing changesets in batches to reduce memory usage: const BATCH_SIZE: usize = 1000;
let mut batch = Vec::with_capacity(BATCH_SIZE);
// Process in batches...🔒 Security Concerns
🧪 Test CoverageCoverage Gaps:
Test Improvements Needed:
Additional Recommendations
SummaryThis is a solid implementation that addresses the core requirements. The main issues are minor (typos, test fixes) with some opportunities for enhancement around memory efficiency and test coverage. The transaction-based approach ensures data integrity, which is crucial for a pruning tool. Priority fixes:
Overall, good work on implementing this important feature! With these improvements, it will be a robust solution for managing database growth. |
|
@claude Let's update the tests (in impl<Db: Database> TrieWriter for DbProvider<Db> {
fn trie_insert_declared_classes(
&self,
block_number: BlockNumber,
updates: &BTreeMap<ClassHash, CompiledClassHash>,
) -> ProviderResult<Felt> { .. }
fn trie_insert_contract_updates(
&self,
block_number: BlockNumber,
state_updates: &StateUpdates,
) -> ProviderResult<Felt> { .. }
}To ensure the pruning process is done correctly, we need to make sure the latest state root is still the same before and after the pruning. We can use the impl<Tx> StateRootProvider for LatestStateProvider<Tx>
where
Tx: DbTx + Send + Sync,
{
fn classes_root(&self) -> ProviderResult<Felt> { .. }
fn contracts_root(&self) -> ProviderResult<Felt> { .. }
fn storage_root(&self, contract: ContractAddress) -> ProviderResult<Option<Felt>> { .. }
} |
|
Claude finished @kariy's task —— View job Todo List
SummaryI've successfully refactored the database pruning tests to be more reliable by using the proper trie provider traits as requested. Here are the key changes: Changes Made:
The tests now properly compute the trie using the same mechanisms as the actual code, making them more reliable and ensuring that the pruning process correctly preserves the latest state while removing historical data. |
…e roots - Replace manual database entry insertion with TrieWriter trait methods - Use trie_insert_declared_classes and trie_insert_contract_updates - Add state root verification before and after pruning using StateRootProvider - Create more realistic test data with proper StateUpdates structure - Ensure pruning operations don't affect the current state roots Co-authored-by: Ammar Arif <kariy@users.noreply.github.com>
|
Ok, i just remember that using impl<Db: Database> BlockWriter for DbProvider<Db> {
fn insert_block_with_states_and_receipts(
&self,
block: SealedBlockWithStatus,
states: StateUpdatesWithClasses,
receipts: Vec<Receipt>,
executions: Vec<TypedTransactionExecutionInfo>,
) -> ProviderResult<()> { .. } |
|
Claude finished @kariy's task —— View job I've successfully refactored the database pruning tests to use Todo List
SummaryThe tests have been updated to better reflect production settings by:
The refactored approach more closely mirrors how blocks are actually inserted in production, making the tests more reliable and representative of real-world usage. |
|
(aside) @claude Counting the historical entries isn't the most robust way to check for the historical states validity before/after the pruning. We should instead try fetching the historical state roots using:
impl<Db: Database> StateFactoryProvider for DbProvider<Db> {
fn historical(
&self,
block_id: BlockHashOrNumber,
) -> ProviderResult<Option<Box<dyn StateProvider>>> { .. }
} |
|
@claude please add a test case for when requested blocks > available blocks |
|
Claude finished @kariy's task —— View job Todo List
SummaryI've successfully added a test case for the scenario where requested blocks exceed available blocks in the database. The new test
This ensures the edge case is properly handled and tested, preventing accidental data loss when users specify incorrect block counts. |
This test verifies that when the user requests to keep more blocks than are available in the database, the prune command correctly: - Prints a warning and returns early - Performs no pruning operations - Preserves all historical state data - Keeps state roots unchanged Co-authored-by: Ammar Arif <kariy@users.noreply.github.com>
|
let's update the |
|
Claude encountered an error —— View job I'll analyze this and get back to you. |
1 similar comment
|
Claude encountered an error —— View job I'll analyze this and get back to you. |
|
Claude finished @devin-ai-integration[bot]'s task —— View job Todo List
PR Review: Database Pruning ToolThank you for implementing this database pruning tool. The feature addresses an important need for managing database growth. Here's my comprehensive review: ✅ Code Quality and Best PracticesStrengths:
Suggestions:
🐛 Potential Bugs or Issues
⚡ Performance ConsiderationsPositive:
Concerns:
🔒 Security Concerns
🧪 Test CoverageGood Coverage:
Coverage Gaps:
Test Quality:
Additional Recommendations
SummaryThis is a solid implementation that addresses the core requirements. The transaction-based approach ensures data integrity, which is crucial for a pruning tool. The main areas for improvement are:
The code is well-structured and follows good practices. With the suggested improvements, this will be a robust solution for managing database growth. |
|
@claude let's add monitoring to track the progression of the pruning process. then we should show a progress bar to visualize the progression in a user friendly way |
|
Devin is currently unreachable - the session may have died. |
|
Claude finished @kariy's task —— View job ✅ Progress monitoring and visualization added to database pruningCompleted Tasks
SummarySuccessfully implemented progress monitoring and visualization for the database pruning process:
The implementation maintains the existing batch processing approach for memory efficiency while providing real-time feedback to users during potentially long-running pruning operations. |
- Add indicatif dependency for progress bar visualization - Implement multi-level progress bars showing overall and per-table progress - Add counting passes to determine total entries for accurate progress tracking - Show table names, current/total entries, elapsed time, and entries per second - Visual progress bars with cyan/blue styling for better user experience Co-authored-by: Ammar Arif <kariy@users.noreply.github.com>
bin/katana/src/cli/db/prune.rs
Outdated
| if let Some((block, _)) = cursor.first()? { | ||
| let mut current_block = block; | ||
| while current_block <= cutoff_block { | ||
| blocks_to_prune += 1; | ||
| if let Some((next_block, _)) = cursor.next()? { | ||
| current_block = next_block; | ||
| } else { | ||
| break; | ||
| } | ||
| } |
There was a problem hiding this comment.
The block counting logic may lead to inaccurate results. The cursor iteration with cursor.next() advances to the next key-value pair in the database, not necessarily the next block number. If multiple trie entries exist for a single block, this approach will count each entry rather than each unique block number. Consider tracking unique block numbers or using a different approach to determine the exact count of blocks to prune.
| if let Some((block, _)) = cursor.first()? { | |
| let mut current_block = block; | |
| while current_block <= cutoff_block { | |
| blocks_to_prune += 1; | |
| if let Some((next_block, _)) = cursor.next()? { | |
| current_block = next_block; | |
| } else { | |
| break; | |
| } | |
| } | |
| if let Some((block, _)) = cursor.first()? { | |
| let mut current_block = block; | |
| let mut unique_blocks = std::collections::HashSet::new(); | |
| while current_block <= cutoff_block { | |
| unique_blocks.insert(current_block); | |
| if let Some((next_block, _)) = cursor.next()? { | |
| current_block = next_block; | |
| } else { | |
| break; | |
| } | |
| } | |
| blocks_to_prune = unique_blocks.len(); |
Spotted by Diamond
Is this helpful? React 👍 or 👎 to let us know.
|
Claude encountered an error —— View job Todo List
|
|
Claude finished @devin-ai-integration[bot]'s task —— View job Todo List
PR Review: Database Pruning ToolThank you for implementing this database pruning tool with progress monitoring. Here's my comprehensive review: ✅ Code Quality and Best PracticesStrengths:
Areas for Improvement:
🐛 Potential Bugs or Issues
⚡ Performance ConsiderationsPositive:
Suggestions:
🔒 Security ConcernsNo significant security issues identified:
Recommendation: Add a 🧪 Test CoverageGood Coverage:
Missing Coverage:
📝 Additional Recommendations
const PROGRESS_STYLE: &str = "{msg} {bar:40.cyan/blue} {pos:>7}/{len:7} [{elapsed_precise}] {per_sec}";
const BATCH_SIZE: usize = 1000; // Process in chunks to limit memory usage
SummaryThis is a well-implemented feature that addresses database growth concerns effectively. The addition of progress monitoring significantly improves user experience, and the batch processing ensures scalability. The code is clean, follows Rust best practices, and handles edge cases appropriately. The main improvements would be around consistency (progress bars, logging) and additional safety features (dry-run, confirmation prompts). With these minor enhancements, this will be a robust solution for managing historical trie data. Verdict: Approve with minor suggestions ✅ |
|
@claude let's update the PR description to reflect the latest changes to the PR |
|
Claude finished @devin-ai-integration[bot]'s task —— View job Todo List
PR Review: Database Pruning ToolThank you for implementing this database pruning tool. The feature addresses an important need for managing database growth. Here's my comprehensive review: ✅ Code Quality and Best PracticesStrengths:
Areas for Improvement:
🐛 Potential Bugs or Issues
⚡ Performance ConsiderationsPositive:
Suggestions:
🔒 Security ConcernsWell Handled:
Recommendations:
🧪 Test CoverageExcellent Coverage:
Missing Coverage:
📝 Additional Recommendations
SummaryThis is a well-implemented feature that effectively addresses database growth concerns. The code is clean, follows Rust best practices, and handles edge cases appropriately. The addition of progress monitoring significantly improves user experience, and the batch processing ensures scalability. Main areas for improvement are around additional safety features (dry-run, confirmation prompts) and enhanced test coverage for error scenarios. With these minor enhancements, this will be a robust solution for managing historical trie data. Verdict: Approve with minor suggestions ✅ |
|
@claude can you open issues for adding these safety features; dry-run, confirmation prompts. |
|
Claude finished @kariy's task —— View job Todo List
I'm unable to create GitHub issues directly due to my current permissions. However, I can help by drafting comprehensive issue descriptions that you can use to create these issues yourself. Issue 1: Add --dry-run flag to database pruning toolTitle: feat(katana-db): Add --dry-run flag to prune command for safer operations Description: Feature Request:
Example Usage: # Preview what would be deleted
katana db prune --latest --dry-run -p path/to/db
# Preview keeping last 1000 blocks
katana db prune --keep-last 1000 --dry-run -p path/to/dbImplementation Notes:
Issue 2: Add confirmation prompt to database pruning toolTitle: feat(katana-db): Add interactive confirmation prompt for pruning operations Description: Feature Request:
Example Interaction: Implementation Notes:
Priority: High (safety feature) These draft issues should help you create the GitHub issues for tracking these important safety enhancements to the database pruning tool. |
* feat(db): add database pruning tool for historical trie data - Add prune subcommand to katana db utility - Support two modes: prune all history or keep last N blocks - Include safety checks and proper error handling - Efficiently handle bulk deletions using database cursors * feat: add progress monitoring and visualization for database pruning - Add indicatif dependency for progress bar visualization - Implement multi-level progress bars showing overall and per-table progress - Add counting passes to determine total entries for accurate progress tracking - Show table names, current/total entries, elapsed time, and entries per second - Visual progress bars with cyan/blue styling for better user experience --------- Co-authored-by: Ammar Arif <evergreenkary@gmail.com> Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>

Database Pruning Tool for Historical Trie Data
Overview
This PR implements a database pruning tool for the
katana dbutility command to address the exponential growth of historical trie data in the Katana database. The tool provides two pruning modes to help manage database size while preserving data integrity.Features
Implementation Details
CLI Interface
Database Tables Affected
The pruning tool operates on six trie-related tables:
ClassesTrieHistory,ContractsTrieHistory,StoragesTrieHistoryClassesTrieChangeSet,ContractsTrieChangeSet,StoragesTrieChangeSetSafety Features
ClassesTrie,ContractsTrie,StoragesTrie)Progress Visualization
indicatif(v0.17.8) for professional progress barsTesting
cargo build --bin katanascripts/clippy.shcargo +nightly fmt --allTest Improvements
BlockWriter::insert_block_with_states_and_receiptsfor production-like setupStateFactoryProviderto check historical state accessibilityTechnical Implementation
DbTxMutfor write transactions with proper commit/rollback handlingIntegerSet(RoaringTreemap) data structure for changeset tablesFiles Changed
bin/katana/src/cli/db/prune.rs: Added pruning functionality with progress monitoringbin/katana/tests/db_prune.rs: Comprehensive test suite with state root verificationbin/katana/Cargo.toml: Addedindicatifdependency for progress barsLink to Devin run
https://app.devin.ai/sessions/746969faebe44c02aa1ee1ff518951fb
Requested by
karii@cartridge.gg