Skip to content

Commit 01ca37b

Browse files
feat: introduce Sequence abstraction for typed pattern representation
2 parents f1ae2af + e2e9a3f commit 01ca37b

File tree

11 files changed

+1732
-9
lines changed

11 files changed

+1732
-9
lines changed

README.md

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -486,6 +486,82 @@ Verbose mode provides:
486486

487487
For complete documentation on logging, see [docs/logging.md](docs/logging.md).
488488

489+
### Using Sequence Objects for Rich Pattern Representation
490+
491+
GSP-Py 4.0+ introduces a **Sequence abstraction class** that provides a richer, more maintainable way to work with sequential patterns. The Sequence class encapsulates pattern items, support counts, and optional metadata in an immutable, hashable object.
492+
493+
#### Traditional Dict-based Output (Default)
494+
495+
```python
496+
from gsppy import GSP
497+
498+
transactions = [
499+
['Bread', 'Milk'],
500+
['Bread', 'Diaper', 'Beer', 'Eggs'],
501+
['Milk', 'Diaper', 'Beer', 'Coke']
502+
]
503+
504+
gsp = GSP(transactions)
505+
result = gsp.search(min_support=0.3)
506+
507+
# Returns: [{('Bread',): 4, ('Milk',): 4, ...}, {('Bread', 'Milk'): 3, ...}, ...]
508+
for level_patterns in result:
509+
for pattern, support in level_patterns.items():
510+
print(f"Pattern: {pattern}, Support: {support}")
511+
```
512+
513+
#### Sequence Objects (New Feature)
514+
515+
```python
516+
from gsppy import GSP
517+
518+
transactions = [
519+
['Bread', 'Milk'],
520+
['Bread', 'Diaper', 'Beer', 'Eggs'],
521+
['Milk', 'Diaper', 'Beer', 'Coke']
522+
]
523+
524+
gsp = GSP(transactions)
525+
result = gsp.search(min_support=0.3, return_sequences=True)
526+
527+
# Returns: [[Sequence(('Bread',), support=4), ...], [Sequence(('Bread', 'Milk'), support=3), ...], ...]
528+
for level_patterns in result:
529+
for seq in level_patterns:
530+
print(f"Pattern: {seq.items}, Support: {seq.support}, Length: {seq.length}")
531+
# Access sequence properties
532+
print(f" First item: {seq.first_item}, Last item: {seq.last_item}")
533+
# Check if item is in sequence
534+
if "Milk" in seq:
535+
print(f" Contains Milk!")
536+
```
537+
538+
#### Key Benefits of Sequence Objects
539+
540+
1. **Rich API**: Access pattern properties like `length`, `first_item`, `last_item`
541+
2. **Type Safety**: IDE autocomplete and better type hints
542+
3. **Immutable & Hashable**: Can be used as dictionary keys
543+
4. **Extensible**: Add metadata for confidence, lift, or custom properties
544+
5. **Backward Compatible**: Convert to/from dict format as needed
545+
546+
```python
547+
from gsppy import Sequence, sequences_to_dict, dict_to_sequences
548+
549+
# Create custom sequences
550+
seq = Sequence.from_tuple(("A", "B", "C"), support=5)
551+
552+
# Extend sequences
553+
extended = seq.extend("D") # Creates Sequence(("A", "B", "C", "D"))
554+
555+
# Add metadata
556+
seq_with_meta = seq.with_metadata(confidence=0.85, lift=1.5)
557+
558+
# Convert between formats for compatibility
559+
seq_result = gsp.search(min_support=0.3, return_sequences=True)
560+
dict_format = sequences_to_dict(seq_result[0]) # Convert to dict
561+
```
562+
563+
For a complete example, see [examples/sequence_example.py](examples/sequence_example.py).
564+
489565
### Loading SPM/GSP Format Files
490566

491567
GSP-Py supports loading datasets in the classical SPM/GSP delimiter format, which is widely used in sequential pattern mining research. This format uses:

docs/api.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,30 @@
99
- search
1010
show_submodules: false
1111

12+
## Sequence
13+
14+
::: gsppy.sequence.Sequence
15+
options:
16+
members:
17+
- from_tuple
18+
- from_item
19+
- extend
20+
- with_support
21+
- with_metadata
22+
- as_tuple
23+
- length
24+
- first_item
25+
- last_item
26+
show_submodules: false
27+
28+
## Sequence Utilities
29+
30+
::: gsppy.sequence.sequences_to_dict
31+
32+
::: gsppy.sequence.dict_to_sequences
33+
34+
::: gsppy.sequence.to_sequence
35+
1236
## Acceleration utilities
1337

1438
::: gsppy.accelerate.support_counts

docs/index.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ Pattern (GSP) algorithm. Use this site to install the library, explore the CLI,
66
## Highlights
77

88
- **Sequence mining** with support-based pruning and candidate generation.
9+
- **Sequence abstraction** for typed pattern representation with rich metadata support.
910
- **Multiple data formats** including JSON, CSV, SPM/GSP, Parquet, and Arrow.
1011
- **Token mapping utilities** for transparent string ↔ integer conversion.
1112
- **Optional acceleration** via Rust and GPU backends.
@@ -33,6 +34,29 @@ for level, freq_patterns in enumerate(patterns, start=1):
3334
print(f"Level {level}: {freq_patterns}")
3435
```
3536

37+
### Using Sequence Objects (New in 4.0+)
38+
39+
Get richer pattern representation with typed Sequence objects:
40+
41+
```python
42+
from gsppy import GSP
43+
44+
transactions = [
45+
["Bread", "Milk"],
46+
["Bread", "Diaper", "Beer", "Eggs"],
47+
["Milk", "Diaper", "Beer", "Coke"],
48+
]
49+
50+
# Enable Sequence objects
51+
patterns = GSP(transactions).search(min_support=0.3, return_sequences=True)
52+
53+
for level_patterns in patterns:
54+
for seq in level_patterns:
55+
print(f"{seq.items} (support={seq.support}, length={seq.length})")
56+
if "Milk" in seq:
57+
enriched = seq.with_metadata(confidence=0.85)
58+
```
59+
3660
### Loading from SPM/GSP Format
3761

3862
```python

docs/usage.md

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -138,6 +138,115 @@ model = GSP(transactions)
138138
frequent = model.search(min_support=0.5)
139139
```
140140

141+
## Using Sequence Objects
142+
143+
GSP-Py 4.0+ introduces a **Sequence abstraction** that provides a richer, more maintainable way to work with sequential patterns. The Sequence class encapsulates pattern items, support counts, and optional metadata.
144+
145+
### Traditional Dict-based Output (Default)
146+
147+
By default, GSP returns patterns as dictionaries mapping tuples to support counts:
148+
149+
```python
150+
from gsppy import GSP
151+
152+
transactions = [
153+
["Bread", "Milk"],
154+
["Bread", "Diaper", "Beer", "Eggs"],
155+
["Milk", "Diaper", "Beer", "Coke"],
156+
]
157+
158+
gsp = GSP(transactions)
159+
result = gsp.search(min_support=0.3)
160+
161+
# Returns: [{('Bread',): 4, ('Milk',): 4, ...}, ...]
162+
for level_patterns in result:
163+
for pattern, support in level_patterns.items():
164+
print(f"Pattern: {pattern}, Support: {support}")
165+
```
166+
167+
### Sequence Objects (New Feature)
168+
169+
Enable the new Sequence objects by setting `return_sequences=True`:
170+
171+
```python
172+
from gsppy import GSP
173+
174+
transactions = [
175+
["Bread", "Milk"],
176+
["Bread", "Diaper", "Beer", "Eggs"],
177+
["Milk", "Diaper", "Beer", "Coke"],
178+
]
179+
180+
gsp = GSP(transactions)
181+
result = gsp.search(min_support=0.3, return_sequences=True)
182+
183+
# Returns: [[Sequence(('Bread',), support=4), ...], ...]
184+
for level_patterns in result:
185+
for seq in level_patterns:
186+
print(f"Pattern: {seq.items}")
187+
print(f"Support: {seq.support}")
188+
print(f"Length: {seq.length}")
189+
190+
# Rich API
191+
if "Milk" in seq:
192+
print("Contains Milk!")
193+
```
194+
195+
### Sequence Properties and Methods
196+
197+
The Sequence class provides a rich API:
198+
199+
```python
200+
from gsppy import Sequence
201+
202+
# Create sequences
203+
seq = Sequence.from_tuple(("A", "B", "C"), support=5)
204+
205+
# Access properties
206+
print(seq.items) # ('A', 'B', 'C')
207+
print(seq.support) # 5
208+
print(seq.length) # 3
209+
print(seq.first_item) # 'A'
210+
print(seq.last_item) # 'C'
211+
212+
# Operations
213+
extended = seq.extend("D") # Sequence(('A', 'B', 'C', 'D'))
214+
updated = seq.with_support(10)
215+
enriched = seq.with_metadata(confidence=0.85, lift=1.5)
216+
217+
# Check membership
218+
if "B" in seq:
219+
print("Sequence contains B")
220+
221+
# Iterate over items
222+
for item in seq:
223+
print(item)
224+
225+
# Convert to tuple for compatibility
226+
tuple_form = seq.as_tuple() # ('A', 'B', 'C')
227+
```
228+
229+
### Converting Between Formats
230+
231+
Use utility functions to convert between Sequence objects and dict format:
232+
233+
```python
234+
from gsppy import sequences_to_dict, dict_to_sequences
235+
236+
# Get results as Sequences
237+
result = gsp.search(min_support=0.3, return_sequences=True)
238+
239+
# Convert to dict format for compatibility
240+
dict_format = sequences_to_dict(result[0])
241+
# {('Bread',): 4, ('Milk',): 4, ...}
242+
243+
# Convert back to Sequences
244+
sequences = dict_to_sequences(dict_format)
245+
# [Sequence(('Bread',), support=4), ...]
246+
```
247+
248+
For a complete example demonstrating all Sequence features, see `examples/sequence_example.py` in the repository.
249+
141250
## Verbose Mode
142251

143252
Enable detailed logging to track algorithm progress and debug issues:

0 commit comments

Comments
 (0)