-
-
Notifications
You must be signed in to change notification settings - Fork 23
Fix GSP to detect non-contiguous ordered subsequences #123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Updated is_subsequence_in_list to use two-pointer approach for ordered matching - Modified tests to reflect correct GSP behavior for non-contiguous patterns - Added test_non_contiguous_subsequences to verify the fix - Updated module docstrings to reflect the change Co-authored-by: jacksonpradolima <[email protected]>
I reviewed PR #116. The main difference is the approach: PR #116: Added a Current PR: Modified the core algorithm to always use non-contiguous matching, aligning with canonical GSP semantics. This is simpler and fixes the reported bug directly. Both approaches are valid. PR #116 offers more flexibility, while the current approach is simpler and follows the principle that GSP should detect ordered subsequences by default (as stated in the issue). If you prefer the parameterized approach from PR #116, I can adapt that implementation. However, I'd recommend clarifying in the issue whether:
The current implementation passes all tests and correctly detects patterns like |
|
@copilot Ok, should we add more tests to show the differences (contiguous and non-continuous) working? |
…tterns - Added test_contiguous_vs_non_contiguous_patterns to show patterns found in both modes - Added test_non_contiguous_with_longer_gaps to test matching with large gaps - Added test_order_sensitivity to verify order requirements - Added test_is_subsequence_contiguous_vs_non_contiguous in utils tests - Added test_is_subsequence_with_gaps to test various gap sizes - All 44 tests pass (5 new tests added) Co-authored-by: jacksonpradolima <[email protected]>
Added 5 comprehensive tests demonstrating the differences between contiguous and non-contiguous pattern matching (commit 860869c): GSP-level tests (
Utils-level tests (
All 44 tests pass. The tests clearly demonstrate:
|
|
@copilot, all is working, but the README isn't updated properly. |
- Added ordered (non-contiguous) matching to Key Features section - Updated sample output to reflect additional patterns detected - Added detailed "Understanding Non-Contiguous Pattern Matching" section with example - Updated notes to clarify patterns are matched in order, not contiguously - Clarified examples to explain gap tolerance in pattern matching Co-authored-by: jacksonpradolima <[email protected]>
Updated README.md with comprehensive documentation (commit 0501c91): Key Features section: Added non-contiguous matching as the first feature with example Sample Output: Updated to show the actual patterns detected with non-contiguous matching (e.g., New section "Understanding Non-Contiguous Pattern Matching": Added with:
Updated notes: Clarified that patterns are matched "in order (not necessarily contiguously)" throughout the documentation |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a critical bug in the GSP algorithm where the is_subsequence_in_list function was using slice-based contiguous matching instead of ordered (non-contiguous) subsequence detection. This caused the algorithm to miss valid patterns like ('a', 'c') in sequence ['a', 'b', 'c'], violating standard GSP semantics.
Key Changes:
- Replaced slice-based contiguous matching with a two-pointer ordered subsequence detection algorithm
- Updated test expectations to reflect additional non-contiguous patterns now correctly detected
- Added comprehensive test coverage for non-contiguous pattern matching scenarios
- Updated documentation to clearly explain ordered (non-contiguous) matching behavior
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| gsppy/utils.py | Replaced slice-based contiguous matching with two-pointer algorithm for ordered (non-contiguous) subsequence detection while maintaining O(n) time complexity |
| tests/test_utils.py | Updated and expanded tests to verify ordered matching behavior, including tests for gaps of various sizes and order sensitivity |
| tests/test_gsp.py | Updated expected values to reflect additional non-contiguous patterns and added 5 comprehensive tests demonstrating pattern detection with gaps |
| README.md | Updated documentation with clear explanations of non-contiguous matching, practical examples, and a new dedicated section explaining the matching behavior |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.



closes #115
The
is_subsequence_in_listfunction was using slice-based matching, which only detected contiguous subsequences. This caused GSP to miss valid patterns like('a', 'c')in sequence['a', 'b', 'c'], violating standard GSP algorithm semantics.Changes
Core algorithm (
gsppy/utils.py):Tests:
test_is_subsequence_in_listto verify ordered (non-contiguous) matching behaviortest_frequent_patternsexpected values to reflect additional patterns now detectedtest_non_contiguous_subsequencescovering the reported casetest_contiguous_vs_non_contiguous_patternsdemonstrating patterns found in both modes vs only non-contiguoustest_non_contiguous_with_longer_gapstesting matching with gaps of varying sizestest_order_sensitivityverifying order requirements are enforcedtest_is_subsequence_contiguous_vs_non_contiguousdocumenting matching differences at utils leveltest_is_subsequence_with_gapstesting gap sizes from 1 to 8+ elementsDocumentation:
Example
Test Coverage
The comprehensive test suite clearly demonstrates:
('X', 'Y')in['X', 'Y', 'Z']('X', 'Z')in['X', 'Y', 'Z']('E', 'A')in['A', 'E']are correctly rejectedAll 44 tests pass (39 original + 5 new comprehensive tests).
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.