@@ -638,6 +638,140 @@ result = gsp.search(min_support=0.5)
638638
639639---
640640
641+ ## 🔧 Flexible Candidate Pruning
642+
643+ GSP-Py supports ** flexible candidate pruning strategies** that allow you to customize how candidate sequences are filtered during pattern mining. This enables optimization for different dataset characteristics and mining requirements.
644+
645+ ### Built-in Pruning Strategies
646+
647+ #### 1. Support-Based Pruning (Default)
648+
649+ The standard GSP pruning based on minimum support threshold:
650+
651+ ``` python
652+ from gsppy.gsp import GSP
653+ from gsppy.pruning import SupportBasedPruning
654+
655+ # Explicit support-based pruning
656+ pruner = SupportBasedPruning(min_support_fraction = 0.3 )
657+ gsp = GSP(transactions, pruning_strategy = pruner)
658+ result = gsp.search(min_support = 0.3 )
659+ ```
660+
661+ #### 2. Frequency-Based Pruning
662+
663+ Prunes candidates based on absolute frequency (minimum number of occurrences):
664+
665+ ``` python
666+ from gsppy.pruning import FrequencyBasedPruning
667+
668+ # Require patterns to appear at least 5 times
669+ pruner = FrequencyBasedPruning(min_frequency = 5 )
670+ gsp = GSP(transactions, pruning_strategy = pruner)
671+ result = gsp.search(min_support = 0.2 )
672+ ```
673+
674+ ** Use case** : When you need patterns to occur a minimum absolute number of times, regardless of dataset size.
675+
676+ #### 3. Temporal-Aware Pruning
677+
678+ Optimizes pruning for time-constrained pattern mining by pre-filtering infeasible patterns:
679+
680+ ``` python
681+ from gsppy.pruning import TemporalAwarePruning
682+
683+ # Prune patterns that cannot satisfy temporal constraints
684+ pruner = TemporalAwarePruning(
685+ mingap = 1 ,
686+ maxgap = 5 ,
687+ maxspan = 10 ,
688+ min_support_fraction = 0.3
689+ )
690+ gsp = GSP(timestamped_transactions, mingap = 1 , maxgap = 5 , maxspan = 10 , pruning_strategy = pruner)
691+ result = gsp.search(min_support = 0.3 )
692+ ```
693+
694+ ** Use case** : Improves performance for temporal pattern mining by eliminating patterns that cannot satisfy temporal constraints.
695+
696+ #### 4. Combined Pruning
697+
698+ Combines multiple pruning strategies for aggressive filtering:
699+
700+ ``` python
701+ from gsppy.pruning import CombinedPruning, SupportBasedPruning, FrequencyBasedPruning
702+
703+ # Apply both support and frequency constraints
704+ strategies = [
705+ SupportBasedPruning(min_support_fraction = 0.3 ),
706+ FrequencyBasedPruning(min_frequency = 5 )
707+ ]
708+ pruner = CombinedPruning(strategies)
709+ gsp = GSP(transactions, pruning_strategy = pruner)
710+ result = gsp.search(min_support = 0.3 )
711+ ```
712+
713+ ** Use case** : When you want to combine multiple filtering criteria for more selective pattern discovery.
714+
715+ ### Custom Pruning Strategies
716+
717+ You can create custom pruning strategies by implementing the ` PruningStrategy ` interface:
718+
719+ ``` python
720+ from gsppy.pruning import PruningStrategy
721+ from typing import Dict, Optional, Tuple
722+
723+ class MyCustomPruner (PruningStrategy ):
724+ def should_prune (
725+ self ,
726+ candidate : Tuple[str , ... ],
727+ support_count : int ,
728+ total_transactions : int ,
729+ context : Optional[Dict] = None
730+ ) -> bool :
731+ # Custom pruning logic
732+ # Return True to prune (filter out), False to keep
733+ pattern_length = len (candidate)
734+ # Example: Prune very long patterns with low support
735+ if pattern_length > 5 and support_count < 10 :
736+ return True
737+ return False
738+
739+ # Use your custom pruner
740+ custom_pruner = MyCustomPruner()
741+ gsp = GSP(transactions, pruning_strategy = custom_pruner)
742+ result = gsp.search(min_support = 0.2 )
743+ ```
744+
745+ ### Performance Characteristics
746+
747+ Different pruning strategies have different performance tradeoffs:
748+
749+ | Strategy | Pruning Aggressiveness | Use Case | Performance Impact |
750+ | ----------| ----------------------| ----------| -------------------|
751+ | ** SupportBased** | Moderate | General-purpose mining | Baseline performance |
752+ | ** FrequencyBased** | High (for large datasets) | Require absolute frequency | Faster on large datasets |
753+ | ** TemporalAware** | High (for temporal data) | Time-constrained patterns | Significant speedup for temporal mining |
754+ | ** Combined** | Very High | Selective pattern discovery | Fastest, but may miss edge cases |
755+
756+ ### Benchmarking Pruning Strategies
757+
758+ To compare pruning strategies on your dataset:
759+
760+ ``` bash
761+ # Compare all strategies
762+ python benchmarks/bench_pruning.py --n_tx 1000 --vocab 100 --min_support 0.2 --strategy all
763+
764+ # Benchmark a specific strategy
765+ python benchmarks/bench_pruning.py --n_tx 1000 --vocab 100 --min_support 0.2 --strategy frequency
766+
767+ # Run multiple rounds for averaging
768+ python benchmarks/bench_pruning.py --n_tx 1000 --vocab 100 --min_support 0.2 --strategy all --rounds 3
769+ ```
770+
771+ See ` benchmarks/bench_pruning.py ` for the complete benchmarking script.
772+
773+ ---
774+
641775## ⌨️ Typing
642776
643777` gsppy ` ships inline type information (PEP 561) via a bundled ` py.typed ` marker. The public API is re-exported from
@@ -651,10 +785,7 @@ larger applications.
651785
652786We are actively working to improve GSP-Py. Here are some exciting features planned for future releases:
653787
654- 1 . ** Custom Filters for Candidate Pruning** :
655- - Enable users to define their own pruning logic during the mining process.
656-
657- 2 . ** Support for Preprocessing and Postprocessing** :
788+ 1 . ** Support for Preprocessing and Postprocessing** :
658789 - Add hooks to allow users to transform datasets before mining and customize the output results.
659790
660791Want to contribute or suggest an
0 commit comments