Skip to content

Commit 94089cc

Browse files
feat: add flexible pruning strategy system to GSP algorithm
feat: add flexible pruning strategy system to GSP algorithm
2 parents e2c1be0 + 6222945 commit 94089cc

File tree

9 files changed

+1724
-7
lines changed

9 files changed

+1724
-7
lines changed

README.md

Lines changed: 135 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -638,6 +638,140 @@ result = gsp.search(min_support=0.5)
638638

639639
---
640640

641+
## 🔧 Flexible Candidate Pruning
642+
643+
GSP-Py supports **flexible candidate pruning strategies** that allow you to customize how candidate sequences are filtered during pattern mining. This enables optimization for different dataset characteristics and mining requirements.
644+
645+
### Built-in Pruning Strategies
646+
647+
#### 1. Support-Based Pruning (Default)
648+
649+
The standard GSP pruning based on minimum support threshold:
650+
651+
```python
652+
from gsppy.gsp import GSP
653+
from gsppy.pruning import SupportBasedPruning
654+
655+
# Explicit support-based pruning
656+
pruner = SupportBasedPruning(min_support_fraction=0.3)
657+
gsp = GSP(transactions, pruning_strategy=pruner)
658+
result = gsp.search(min_support=0.3)
659+
```
660+
661+
#### 2. Frequency-Based Pruning
662+
663+
Prunes candidates based on absolute frequency (minimum number of occurrences):
664+
665+
```python
666+
from gsppy.pruning import FrequencyBasedPruning
667+
668+
# Require patterns to appear at least 5 times
669+
pruner = FrequencyBasedPruning(min_frequency=5)
670+
gsp = GSP(transactions, pruning_strategy=pruner)
671+
result = gsp.search(min_support=0.2)
672+
```
673+
674+
**Use case**: When you need patterns to occur a minimum absolute number of times, regardless of dataset size.
675+
676+
#### 3. Temporal-Aware Pruning
677+
678+
Optimizes pruning for time-constrained pattern mining by pre-filtering infeasible patterns:
679+
680+
```python
681+
from gsppy.pruning import TemporalAwarePruning
682+
683+
# Prune patterns that cannot satisfy temporal constraints
684+
pruner = TemporalAwarePruning(
685+
mingap=1,
686+
maxgap=5,
687+
maxspan=10,
688+
min_support_fraction=0.3
689+
)
690+
gsp = GSP(timestamped_transactions, mingap=1, maxgap=5, maxspan=10, pruning_strategy=pruner)
691+
result = gsp.search(min_support=0.3)
692+
```
693+
694+
**Use case**: Improves performance for temporal pattern mining by eliminating patterns that cannot satisfy temporal constraints.
695+
696+
#### 4. Combined Pruning
697+
698+
Combines multiple pruning strategies for aggressive filtering:
699+
700+
```python
701+
from gsppy.pruning import CombinedPruning, SupportBasedPruning, FrequencyBasedPruning
702+
703+
# Apply both support and frequency constraints
704+
strategies = [
705+
SupportBasedPruning(min_support_fraction=0.3),
706+
FrequencyBasedPruning(min_frequency=5)
707+
]
708+
pruner = CombinedPruning(strategies)
709+
gsp = GSP(transactions, pruning_strategy=pruner)
710+
result = gsp.search(min_support=0.3)
711+
```
712+
713+
**Use case**: When you want to combine multiple filtering criteria for more selective pattern discovery.
714+
715+
### Custom Pruning Strategies
716+
717+
You can create custom pruning strategies by implementing the `PruningStrategy` interface:
718+
719+
```python
720+
from gsppy.pruning import PruningStrategy
721+
from typing import Dict, Optional, Tuple
722+
723+
class MyCustomPruner(PruningStrategy):
724+
def should_prune(
725+
self,
726+
candidate: Tuple[str, ...],
727+
support_count: int,
728+
total_transactions: int,
729+
context: Optional[Dict] = None
730+
) -> bool:
731+
# Custom pruning logic
732+
# Return True to prune (filter out), False to keep
733+
pattern_length = len(candidate)
734+
# Example: Prune very long patterns with low support
735+
if pattern_length > 5 and support_count < 10:
736+
return True
737+
return False
738+
739+
# Use your custom pruner
740+
custom_pruner = MyCustomPruner()
741+
gsp = GSP(transactions, pruning_strategy=custom_pruner)
742+
result = gsp.search(min_support=0.2)
743+
```
744+
745+
### Performance Characteristics
746+
747+
Different pruning strategies have different performance tradeoffs:
748+
749+
| Strategy | Pruning Aggressiveness | Use Case | Performance Impact |
750+
|----------|----------------------|----------|-------------------|
751+
| **SupportBased** | Moderate | General-purpose mining | Baseline performance |
752+
| **FrequencyBased** | High (for large datasets) | Require absolute frequency | Faster on large datasets |
753+
| **TemporalAware** | High (for temporal data) | Time-constrained patterns | Significant speedup for temporal mining |
754+
| **Combined** | Very High | Selective pattern discovery | Fastest, but may miss edge cases |
755+
756+
### Benchmarking Pruning Strategies
757+
758+
To compare pruning strategies on your dataset:
759+
760+
```bash
761+
# Compare all strategies
762+
python benchmarks/bench_pruning.py --n_tx 1000 --vocab 100 --min_support 0.2 --strategy all
763+
764+
# Benchmark a specific strategy
765+
python benchmarks/bench_pruning.py --n_tx 1000 --vocab 100 --min_support 0.2 --strategy frequency
766+
767+
# Run multiple rounds for averaging
768+
python benchmarks/bench_pruning.py --n_tx 1000 --vocab 100 --min_support 0.2 --strategy all --rounds 3
769+
```
770+
771+
See `benchmarks/bench_pruning.py` for the complete benchmarking script.
772+
773+
---
774+
641775
## ⌨️ Typing
642776

643777
`gsppy` ships inline type information (PEP 561) via a bundled `py.typed` marker. The public API is re-exported from
@@ -651,10 +785,7 @@ larger applications.
651785

652786
We are actively working to improve GSP-Py. Here are some exciting features planned for future releases:
653787

654-
1. **Custom Filters for Candidate Pruning**:
655-
- Enable users to define their own pruning logic during the mining process.
656-
657-
2. **Support for Preprocessing and Postprocessing**:
788+
1. **Support for Preprocessing and Postprocessing**:
658789
- Add hooks to allow users to transform datasets before mining and customize the output results.
659790

660791
Want to contribute or suggest an

0 commit comments

Comments
 (0)